As a statistics PhD candidate at the University of Auckland, I spend my days researching how to model buses in real-time, and then use this to make improved predictions of arrival time and travel time.
As the lead developer of iNZight, I'm responsible for developing and maintaining a data visualisation and analysis tool used by New Zealand schools, universities, and various other individuals and organisations around the world.
I've completed all of my study at the University of Auckland, originally studying biological sciences with a marine science specialisation, I was swiftly coerced to switch to a double major in biology and statistics. While I enjoyed most of my biology papers, I found I enjoyed statistics more (it is more enjoyable, and they introduced me to this thing called R). Therefore, I didn't hesitate to enrol in postgraduate study in statistics.
The main goal of my current work is to model Auckland's entire public transport system in real time, in order to make reliable preditions about arrival time. The first piece of the puzzle involves a particle filter model to determine the speed (and other variables) based on the observed GPS position of all buses in the Auckland public transport network. The second piece is a Kalman filter that uses bus speeds from the particle filter to estimate the speed along individual segments of road. Finally, the estimated speeds will be used to predict how long it will take for each bus to travel from its current position to future stops.
What is a particle filter? Basically, parallel universes. But instead of universes, we have buses! A whole fleet of imaginary—but plausible—buses, all traveling along the route at different speeds, some stopping at bus stops and traffic lights while others don't. Then, whenever we receive a GPS position update from the real bus, we delete any imaginary buses that are no longer plausible (i.e., near the observed location).
And how about that Kalman filter you mentioned? This is just a way of updating a previous estimate when we receive a new observation, allowing the state (speed along a section of road) to change over time, but also allowing for uncertainty in the measurements (bus speeds). The more buses that recently traveled down a piece of road, the better our estimate of the average speed—and how variable it is: did the buses travel at similar speeds, or were they quite different?
The size-selectivity of fishing gear is of particular interest to fisheries managers, aiding in the reduction of by-catch and increase of profits. Size-selectivity models have typically been fitted using GLMs and mixed-effects models implemented using software such as SAS. We investigate a Bayesian approach to fitting these models, with special focus on the comparability of frequentist and Bayesian models. We look at several case studies, which are used to establish the validity of our models by comparing our results to those published in previous analyses. MCMC diagnostic tests and PPCs for overdispersion are explored, and random-effects are shown to be the preferred method of modelling overdispersion in the data. Having formulated a general model, several extensions are investigated. The Poisson distribution is used as an alternative likelihood function, allowing the models to make use of any population size distribution information, and multiple random-effects models are implemented that were previously found to be too complex. Semiparametric selection curves—notably basis splines—are used with the constraint that they be continuous, non-decreasing functions of length. A new R package,
BSM, is introduced as a simple tool for fitting many of the Bayesian selectivity models discussed in this thesis.
Abstract: Reverberation mapping is a technique used to infer the lag, τ, between two light curves obtained from two different regions of AGN. The lag can later be used in the estimation of the mass of the central black hole. Traditional reverberation mapping methods involve the use of the CCF, and more recently Bayesian models to estimate the lag, however these require long, expensive observational campaigns to obtain the data necessary to infer the lag accurately. The difficulty in obtaining the data has led to the development of a new technique where the CCFs from multiple poorly-measured AGN are combined, or stacked, to infer a typical lag for the sample. Through simulation, this method was shown to recover the lag, although the population lag variability and uncertainty of the lag estimate due to the sparse data were difficult to differentiate from the stacked CCF. Therefore, we proposed using a Bayesian hierarchical model as an alternative, with the advantage of obtaining individual estimates for population variability and estimation error. After first implementing a simple model to show our method could be used to recover the population mean lag, we constructed a model that would analyse multiple AGN and infer the population distribution of lags. We were able to recover the population parameters, and their associated uncertainties, and show that the population variability was separable from the uncertainty in estimating the mean lag.
I'm currently the lead designer, developer, and maintainer of iNZight, a free, easy-to-use data exploration, visualisation, and analysis tool targeted towards users with little or no prior statistics experience, allowing them to produce complex graphs with ease. iNZight is developed by students here at the University of Auckland, usually as part of a project, which is overseen by Chris Wild.
I started working on the project in the middle of 2013, and over the next four years have managed to contribute to all of the components, most notably the graphics package (iNZightPlots), and more recently the main GUI. I've also been responsible for maintaining the 10 R packages behind iNZight, the R repository, the Windows, Mac, and Linux installers, and the website.