• Analysis of GNAC Volleyball using the Bradley-Terry Model

      Karwoski, Daniel; Short, Margaret; Goddard, Scott; McIntyre, Julie; Barry, Ron (2020-05)
      Ranking is the process by which a set of objects is assigned a linear ordering based on some property that they possess. Not surprisingly, there are many different methods of ranking used in a wide array of diverse applications; ranking plays a vital role in sports analysis, preference testing, search engine optimization, psychological research, and many other areas. One of the more popular ranking models is Bradley-Terry, which is a type of aggregation ranking that has been used mostly within the realm of sports. Bradley-Terry uses the outcome of individual matchups (paired-comparisons) to create rankings using maximum-likelihood estimation. This project aims to briefly examine the motivation for modeling sporting events, review the history of ranking and aggregation-ranking, communicate the mathematical theory behind the Bradley-Terry model, and apply the model to a novel volleyball dataset.
    • Analyzing tree distribution and abundance in Yukon-Charley Rivers National Preserve: developing geostatistical Bayesian models with count data

      Winder, Samantha; Short, Margaret; Roland, Carl; Goddard, Scott; McIntyre, Julie (2018-05)
      Species distribution models (SDMs) describe the relationship between where a species occurs and underlying environmental conditions. For this project, I created SDMs for the five tree species that occur in Yukon-Charley Rivers National Preserve (YUCH) in order to gain insight into which environmental covariates are important for each species, and what effect each environmental condition has on that species' expected occurrence or abundance. I discuss some of the issues involved in creating SDMs, including whether or not to incorporate spatially explicit error terms, and if so, how to do so with generalized linear models (GLMs, which have discrete responses). I ran a total of 10 distinct geostatistical SDMs using Markov Chain Monte Carlo (Bayesian methods), and discuss the results here. I also compare these results from YUCH with results from a similar analysis conducted in Denali National Park and Preserve (DNPP).
    • An application of an integrated population model: estimating population size of the Fortymile caribou herd using limited data

      Inokuma, Megumi; Short, Margaret; Barry, Ron; Goddard, Scott (2017-05)
      An Integrated Population Model (IPM) was employed to estimate the population size of the Fortymile Caribou herd (FCH), utilizing multiple types of biological data. Current population size estimates of the FCH are made by the Alaska Department of Fish and Game (ADF&G) using an aerial photo census technique. Taking aerial photos for the counts requires certain environmental conditions, such as the existence of swarms of mosquitoes that drive the majority of caribou to wide open spaces, as well as favorable weather conditions, which allow low-altitude flying in mid-June. These conditions have not been met in recent years so there is no count estimate for those years. IPMs are considered as alternative methods to estimate a population size. IPMs contain three components: a stochastic component that explains the relationship between biological information and population size; demographic models that derive parameters from independently conducted surveys; and a link between IPM estimates and observed-count estimates. In this paper, we combine census count data, parturition data, calf and female adults survival data, and sex composition data, all of which were collected by ADF&G between 1990 and 2016. During this time period, there were 13 years - including two five-consecutive-year periods - for which no photo census count estimates were available. We estimate the missing counts and the associated uncertainty using a Bayesian IPM. Our case study shows that IPMs are capable of estimating a population size for years with missing count data when we have other biological data. We suggest that sensitivity analyses be done to learn the relationship between amount of data and the accuracy of the estimates.
    • An application of Bayesian variable selection to international economic data

      Tian, Xiang; Goddard, Scott; Barry, Ron; Short, Margaret; McIntyre, Julie (2017-06)
      GDP plays an important role in people's lives. For example, when GDP increases, the unemployment rate will frequently decrease. In this project, we will use four different Bayesian variable selection methods to verify economic theory regarding important predictors to GDP. The four methods are: g-prior variable selection with credible intervals, local empirical Bayes with credible intervals, variable selection by indicator function, and hyper-g prior variable selection. Also, we will use four measures to compare the results of the various Bayesian variable selection methods: AIC, BIC, Adjusted-R squared and cross-validation.
    • Assessing year to year variability of inertial oscillation in the Chukchi Sea using the wavelet transform

      Leonard, David (2016-05)
      Three years of ocean drifter data from the Chukchi Sea were examined using the wavelet transform to investigate inertial oscillation. There was an increasing trend in number, duration, and hence total proportion of time spent in inertial oscillation events. Additionally, the Chukchi Sea seems to facilitate inertial oscillation that is easier to discern using north-south velocity records rather than east-west velocity records. The data used in this analysis was transformed using wavelets, which are generally used as a qualitative statistical method. Because of this, in addition to measurement error and random ocean noise, there is an additional source of variability and correlation which makes concrete statistical results challenging to obtain. However, wavelets were an effective tool for isolating the specific period of inertial oscillation and examining how it changed over time.
    • A Bayesian mixed multistate open-robust design mark-recapture model to estimate heterogeneity in transition rates in an imperfectly detected system

      Badger, Janelle J.; McIntyre, Julie; Barry, Ron; Goddard, Scott; Breed, Greg (2020-12)
      Multistate mark-recapture models have long been used to assess ecological and demographic parameters such as survival, phenology, and breeding rates by estimating transition rates among a series of latent or observable states. Here, we introduce a Bayesian mixed multistate open robust design mark recapture model (MSORD), with random intercepts and slopes to explore individual heterogeneity in transition rates and individual responses to covariates. We fit this model to simulated data sets to test whether the model could accurately and precisely estimate five parameters, set to known values a priori, under varying sampling schemes. To assess the behavior of the model integrated across replicate fits, we employed a two-stage hierarchical model fitting algorithm for each of the simulations. The majority of model fits showed no sign of inadequate convergence according to our metrics, with 81.25% of replicate posteriors for parameters of interest having general agreement among chains (r < 1.1). Estimates of posterior distributions for mean transition rates and standard deviation in random intercepts were generally well-defined. However, we found that models estimated the standard deviation in random slopes and the correlation among random effects relatively poorly, especially in simulations with low power to detect individuals (e.g. low detection rates, study duration, or secondary samples). We also apply this model to a dataset of 200 female grey seals breeding on Sable Island from 1985-2018 to estimate individual heterogeneity in reproductive rate and response to near-exponential population growth. The Bayesian MSORD estimated substantial variation among individuals in both mean transition rates and responses to population size. The correlation among effects trended positively, indicating that females with high reproductive performance (more positive intercept) were also more likely to respond better to population growth (more positive slope) and vice versa. Though our simulation results lend confidence to analyses using this method on well developed datasets on highly observable systems, we caution the use of this framework in sparse data situations.
    • Bayesian predictive process models for historical precipitation data of Alaska and southwestern Canada

      Vanney, Peter; Short, Margaret; Goddard, Scott; Barry, Ronald (2016-05)
      In this paper we apply hierarchical Bayesian predictive process models to historical precipitation data using the spBayes R package. Classical and hierarchical Bayesian techniques for spatial analysis and modeling require large matrix inversions and decompositions, which can take prohibitive amounts of time to run (n observations take time on the order of n3). Bayesian predictive process models have the same spatial framework as hierarchical Bayesian models but fit a subset of points (called knots) to the sample which allows for large scale dimension reduction and results in much smaller matrix inversions and faster computing times. These computationally less expensive models allow average desktop computers to analyze spatially related datasets in excess of 20,000 observations in an acceptable amount of time.
    • Climate drivers of Interior Alaska wildland fire

      Bukhader, Maryam; Bhatt, Uma S.; Mölders, C. Nicole; Panda, Santosh; Rupp, T. Scott (2020-05)
      This study focused on the climate drivers of wildfire in Interior Alaska that occurred in summer season, JJA, during periods in 1994 to 2017. Analysis results presented in this paper provide identify links between meteorological variables and area burned, in the context of spatial and temporal variability at the PSA level. Warmer temperatures caused higher chance of wildland fires as in summer 2004 (26797 km2) where the temperature reached the highest levels compared to all years of study. In addition, this study has shown that temperatures have the same seasonal cycle in all PSAs level; where the temperature increase begins in June, peaks in July and then gradually decline, consistent with the fire season. Although precipitation limits the increase in forest fires, the accompanying lightning increases the chance fires which gives precipitation a double role in influencing the risk of fire. This can be seen clearly in both Upper Yukon valley (AK02) and Tanana Zone South (AK03S) where the largest number of lightning strikes over Interior Alaska occur (17000 and 11000 strikes, respectively). In addition, these two PSAs have the greatest area burned (1441.2 and 1112.4 km2).There is an upward trend in both temperature and precipitation in all months especially in May and September which indicates a decline in the snow season and an increase in the length of the fire season. A similar pattern was documented between PSAs in eastern versus western Alaska. Eastern PSAs receive the highest amount of precipitation in July, (AK01W , AK01E, AK02, AK03N, AK03S) , and western PSAs in August, (AK04, AK05, AK07). The years 2004, 2015, 2005 and 2009 display the largest values for area burned with extremely warm and dry condition especially in 2004 with approximately 26797 km2 (6.6 m acres).
    • A comparison of discrete inverse methods for determining parameters of an economic model

      Jurkowski, Caleb; Maxwell, David; Short, Margaret; Bueler, Edward (2017-08)
      We consider a time-dependent spatial economic model for capital in which the region's production function is a parameter. This forward model predicts the distribution of capital of a region based on that region's production function. We will solve the inverse problem based on this model, i.e. given data describing the capital of a region we wish to determine the production function through discretization. Inverse problems are generally ill-posed, which in this case means that if the data describing the capital are changed slightly, the solution of the inverse problem could change dramatically. The solution we seek is therefore a probability distribution of parameters. However, this probability distribution is complex, and at best we can describe some of its features. We describe the solution to this inverse problem using two different techniques, Markov chain Monte Carlo (Metropolis Algorithm ) and least squares optimization, and compare summary statistics coming from each method.
    • Comparison of lower body segment alignment of elite level hockey players to age-matched non-hockey players

      Kimbal, Jim R.; Bult-Ito, Abel; Taylor, Barbara; Duffy, Lawrence (2015-12)
      Lower body overuse and insidious onset injuries are thought to have an underlying biomechanical component which may be predisposing to injury. The purpose of this study was to compare lower body biomechanical characteristics for elite hockey players to matched controls. I hypothesize that elite hockey players have a greater degree of anterior pelvic tilt, greater varus knee angle, a higher foot arch and feet held in parallel more during gait than a matched non-skating population. Measures were taken of elite level, college aged, male hockey players and compared to cross country runners (ten subjects in each group) who served as controls for trunk angle, pelvic tilt angle, knee alignment, (varus/valgus angle), foot angle, arch index (arch height), hip, center of range of motion, hip external rotation, hip internal rotation, hip total range of motion (ROM), knee transverse plane ROM, and step width. The results obtained support the hypothesis for anterior pelvic tilt and foot angle during gait. Although knee angle was in the expected varus direction it was not significant and no differences were observed in the foot arch between the groups. All other measurements not directly related to the hypothesis were not significantly different with the exception of mean step width. The obtained results are important as recent literature describes a lower body posture of medial collapse into "dynamic valgus" as being predisposing to injury. Results show, on the spectrum from lower body varus to lower body valgus, hockey players are on the varus side of the spectrum in all attributes except arch height, which was similar in both populations. Since lower body alignment is thought to be coupled, this inconsistency appears contrary to the "medial collapse into dynamic valgus" model and may explain why foot orthotics and athletic shoes used as an injury intervention often fail.
    • Edge detection using Bayesian process convolutions

      Lang, Yanda; Short, Margaret; Barry, Ron; Goddard, Scott; McIntyre, Julie (2017-05)
      This project describes a method for edge detection in images. We develop a Bayesian approach for edge detection, using a process convolution model. Our method has some advantages over the classical edge detector, Sobel operator. In particular, our Bayesian spatial detector works well for rich, but noisy, photos. We first demonstrate our approach with a small simulation study, then with a richer photograph. Finally, we show that the Bayesian edge detector performance gives considerable improvement over the Sobel operator performance for rich photos.
    • Effect of filling methods on the forecasting of time series with missing values

      Cheng, Mingyuan (2014-12)
      The Gulf of Alaska Mooring (GAK1) monitoring data set is an irregular time series of temperature and salinity at various depths in the Gulf of Alaska. One approach to analyzing data from an irregular time series is to regularize the series by imputing or filling in missing values. In this project we investigated and compared four methods (denoted as APPROX, SPLINE, LOCF and OMIT) of doing this. Simulation was used to evaluate the performance of each filling method on parameter estimation and forecasting precision for an Autoregressive Integrated Moving Average (ARIMA) model. Simulations showed differences among the four methods in terms of forecast precision and parameter estimate bias. These differences depended on the true values of model parameters as well as on the percentage of data missing. Among the four methods used in this project, the method OMIT performed the best and SPLINE performed the worst. We also illustrate the application of the four methods to forecasting the Gulf of Alaska Mooring (GAK1) monitoring time series, and discuss the results in this project.
    • Estimating confidence intervals on accuracy in classification in machine learning

      Zhang, Jesse; McIntyre, Julie; Barry, Ronald; Goddard, Scott (2019-04)
      This paper explores various techniques to estimate a confidence interval on accuracy for machine learning algorithms. Confidence intervals on accuracy may be used to rank machine learning algorithms. We investigate bootstrapping, leave one out cross validation, and conformal prediction. These techniques are applied to the following machine learning algorithms: support vector machines, bagging AdaBoost, and random forests. Confidence intervals are produced on a total of nine datasets, three real and six simulated. We found in general not any technique was particular successful at always capturing the accuracy. However leave one out cross validation had the most consistency amongst all techniques for all datasets.
    • Extending the Lattice-Based Smoother using a generalized additive model

      Rakhmetova, Gulfaya; McIntyre, Julie; Short, Margaret; Goddard, Scott (2017-12)
      The Lattice Based Smoother was introduced by McIntyre and Barry (2017) to estimate a surface defined over an irregularly-shaped region. In this paper we consider extending their method to allow for additional covariates and non-continuous responses. We describe our extension which utilizes the framework of generalized additive models. A simulation study shows that our method is comparable to the Soap film smoother of Wood et al. (2008), under a number of different conditions. Finally we illustrate the method's practical use by applying it to a real data set.
    • Gaussian process convolutions for Bayesian spatial classification

      Best, John K.; Short, Margaret; Goddard, Scott; Barry, Ron; McIntyre, Julie (2016-05)
      We compare three models for their ability to perform binary spatial classification. A geospatial data set consisting of observations that are either permafrost or not is used for this comparison. All three use an underlying Gaussian process. The first model considers this process to represent the log-odds of a positive classification (i.e. as permafrost). The second model uses a cutoff. Any locations where the process is positive are classified positively, while those that are negative are classified negatively. A probability of misclassification then gives the likelihood. The third model depends on two separate processes. The first represents a positive classification, while the second a negative classification. Of these two, the process with greater value at a location provides the classification. A probability of misclassification is also used to formulate the likelihood for this model. In all three cases, realizations of the underlying Gaussian processes were generated using a process convolution. A grid of knots (whose values were sampled using Markov Chain Monte Carlo) were convolved using an anisotropic Gaussian kernel. All three models provided adequate classifications, but the single and two-process models showed much tighter bounds on the border between the two states.
    • A geostatistical model based on Brownian motion to Krige regions in R2 with irregular boundaries and holes

      Bernard, Jordy; McIntyre, Julie; Barry, Ron; Goddard, Scott (2019-05)
      Kriging is a geostatistical interpolation method that produces predictions and prediction intervals. Classical kriging models use Euclidean (straight line) distance when modeling spatial autocorrelation. However, for estuaries, inlets, and bays, shortest-in-water distance may capture the system’s proximity dependencies better than Euclidean distance when boundary constraints are present. Shortest-in-water distance has been used to krige such regions (Little et al., 1997; Rathbun, 1998); however, the variance-covariance matrices used in these models have not been shown to be mathematically valid. In this project, a new kriging model is developed for irregularly shaped regions in R 2 . This model incorporates the notion of flow connected distance into a valid variance-covariance matrix through the use of a random walk on a lattice, process convolutions, and the non-stationary kriging equations. The model developed in this paper is compared to existing methods of spatial prediction over irregularly shaped regions using water quality data from Puget Sound.
    • An investigation into the effectiveness of simulation-extrapolation for correcting measurement error-induced bias in multilevel models

      Custer, Christopher (2015-04)
      This paper is an investigation into correcting the bias introduced by measurement errors into multilevel models. The proposed method for this correction is simulation-extrapolation (SIMEX). The paper begins with a detailed discussion of measurement error and its effects on parameter estimation. We then describe the simulation-extrapolation method and how it corrects for the bias introduced by the measurement error. Multilevel models and their corresponding parameters are also defined before performing a simulation. The simulation involves estimating the multilevel model parameters using our true explanatory variables, the observed measurement error variables, and two different SIMEX techniques. The estimates obtained from our true explanatory values were used as a baseline for comparing the effectiveness of the SIMEX method for correcting bias. From these results, we were able to determine that the SIMEX was very effective in correcting the bias in estimates of the fixed effects parameters and often provided estimates that were not significantly different than those from the estimates derived using the true explanatory variables. The simulation also suggested that the SIMEX approach was effective in correcting bias for the random slope variance estimates, but not for the random intercept variance estimates. Using the simulation results as a guideline, we then applied the SIMEX approach to an orthodontics dataset to illustrate the application of SIMEX to real data.
    • Investigation of strongly ducted infrasonic dispersion using a vertical eigenfunction expansion of the Helmholtz equation in a modal broad band acoustic propagation code

      Edon, Robert Alexander; Olson, John V.; Fee, David E.; Szuberla, Curt A. (2015-12)
      This study investigates an infrasound propagation model created by the National Center for Physical Acoustics (NCPA) which is applied to atmospheric data with a strong temperature inversion in the lower atmosphere. This temperature inversion is believed to be the primary cause of a dispersed infrasonic signal recorded by an infrasound sensor array located on the Southern California coast in August, 2012. The received signal is characterized by initial low frequency content followed by a high frequency content tail. It is shown the NCPA model is hindered by limited atmospheric data and no ground truth for the source function which generated the received signal. The results of the NCPA model are shown to not reproduce the recorded signal and provide inconclusive evidence for infrasonic dispersion.
    • Moose abundance estimation using finite population block kriging on Togiak National Wildlife Refuge, Alaska

      Frye, Graham G. (2016-12)
      Monitoring the size and demographic characteristics of animal populations is fundamental to the fields of wildlife ecology and wildlife management. A diverse suite of population monitoring methods have been developed and employed during the past century, but challenges in obtaining rigorous population estimates remain. I used simulation to address survey design issues for monitoring a moose population at Togiak National Wildlife Refuge in southwestern Alaska using finite population block kriging. In the first chapter, I compared the bias in the Geospatial Population Estimator (GSPE; which uses finite population block kriging to estimate animal abundance) between two survey unit configurations. After finding that substantial bias was induced through the use of the historic survey unit configuration, I concluded that the ’’standard” unit configuration was preferable because it allowed unbiased estimation. In the second chapter, I examined the effect of sampling intensity on performance of the GSPE. I concluded that bias and confidence interval coverage were unaffected by sampling intensity, whereas the coefficient of variation (CV) and root mean squared error (RMSE) decreased with increasing sampling intensity. In the final chapter, I examined the effect of spatial clustering by moose on model performance. Highly clustered moose distributions induced a small amount of positive bias, confidence interval coverage lower than the nominal rate, higher CV, and higher RMSE. Some of these issues were ameliorated by increasing sampling intensity, but if highly clustered distributions of moose are expected, then substantially greater sampling intensities than those examined here may be required.
    • Multiple imputation of missing multivariate atmospheric chemistry time series data from Denali National Park

      Charoonsophonsak, Chanachai; Goddard, Scott; Barry, Ronald; McIntyre, Julie; Short, Margaret (2020-05)
      This paper explores a technique where we impute missing values for an incomplete dataset via multiple imputation. Incomplete data is one of the most common issues in data analysis and often occurs when measuring chemical and environmental data. The dataset that we used in the model consists of 26 atmospheric particulates or elements that were measured semiweekly in Denali National Park from 1988 to 2015. The collection days were alternating between three and four days apart from 3/2/88 - 9/30/00 and being consistently collected every three days apart from 10/3/00 - 12/29/15. For this reason, the data were initially partitioned into two in case the separation between collection days would have an impact. With further analysis, we concluded that the misalignments between the two datasets had very little or no impact on our analysis and therefore combined the two. After running five Markov chains of 1000 iterations we concluded that the model stayed consistent between the five chains. We found out that in order to get a better understanding of how well the imputed values did, more exploratory analysis on the imputed datasets would be required.