• Bayesian predictive process models for historical precipitation data of Alaska and southwestern Canada

      Vanney, Peter; Short, Margaret; Goddard, Scott; Barry, Ronald (2016-05)
      In this paper we apply hierarchical Bayesian predictive process models to historical precipitation data using the spBayes R package. Classical and hierarchical Bayesian techniques for spatial analysis and modeling require large matrix inversions and decompositions, which can take prohibitive amounts of time to run (n observations take time on the order of n3). Bayesian predictive process models have the same spatial framework as hierarchical Bayesian models but fit a subset of points (called knots) to the sample which allows for large scale dimension reduction and results in much smaller matrix inversions and faster computing times. These computationally less expensive models allow average desktop computers to analyze spatially related datasets in excess of 20,000 observations in an acceptable amount of time.
    • Vertex arboricity of triangle-free graphs

      Warren, Samantha; Gimbel, John; Faudree, Jill; Allman, Elizabeth (2016-05)
      The vertex arboricity of a graph is the minimum number of colors needed to color the vertices so that the subgraph induced by each color class is a forest. In other words, the vertex arboricity of a graph is the fewest number of colors required in order to color a graph such that every cycle has at least two colors. Although not standard, we will refer to vertex arboricity simply as arboricity. In this paper, we discuss properties of chromatic number and k-defective chromatic number and how those properties relate to the arboricity of trianglefree graphs. In particular, we find bounds on the minimum order of a graph having arboricity three. Equivalently, we consider the largest possible vertex arboricity of triangle-free graphs of fixed order.
    • An existence theorem for solutions to a model problem with Yamabe-positive metric for conformal parameterizations of the Einstein constraint equations

      Knowles, Tyler D.; Maxwell, David; Rhodes, John A.; Rybkin, Alexei (2016-08)
      We use the conformal method to investigate solutions of the vacuum Einstein constraint equations on a manifold with a Yamabe-positive metric. To do so, we develop a model problem with symmetric data on Sn⁻¹ x S¹. We specialize the model problem to a two-parameter family of conformal data, and find that no solutions exist when the transverse-traceless tensor is identically zero. When the transverse traceless tensor is nonzero, we observe an existence theorem in both the near-constant mean curvature and far-from-constant mean curvature regimes.
    • Moose abundance estimation using finite population block kriging on Togiak National Wildlife Refuge, Alaska

      Frye, Graham G. (2016-12)
      Monitoring the size and demographic characteristics of animal populations is fundamental to the fields of wildlife ecology and wildlife management. A diverse suite of population monitoring methods have been developed and employed during the past century, but challenges in obtaining rigorous population estimates remain. I used simulation to address survey design issues for monitoring a moose population at Togiak National Wildlife Refuge in southwestern Alaska using finite population block kriging. In the first chapter, I compared the bias in the Geospatial Population Estimator (GSPE; which uses finite population block kriging to estimate animal abundance) between two survey unit configurations. After finding that substantial bias was induced through the use of the historic survey unit configuration, I concluded that the ’’standard” unit configuration was preferable because it allowed unbiased estimation. In the second chapter, I examined the effect of sampling intensity on performance of the GSPE. I concluded that bias and confidence interval coverage were unaffected by sampling intensity, whereas the coefficient of variation (CV) and root mean squared error (RMSE) decreased with increasing sampling intensity. In the final chapter, I examined the effect of spatial clustering by moose on model performance. Highly clustered moose distributions induced a small amount of positive bias, confidence interval coverage lower than the nominal rate, higher CV, and higher RMSE. Some of these issues were ameliorated by increasing sampling intensity, but if highly clustered distributions of moose are expected, then substantially greater sampling intensities than those examined here may be required.
    • Edge detection using Bayesian process convolutions

      Lang, Yanda; Short, Margaret; Barry, Ron; Goddard, Scott; McIntyre, Julie (2017-05)
      This project describes a method for edge detection in images. We develop a Bayesian approach for edge detection, using a process convolution model. Our method has some advantages over the classical edge detector, Sobel operator. In particular, our Bayesian spatial detector works well for rich, but noisy, photos. We first demonstrate our approach with a small simulation study, then with a richer photograph. Finally, we show that the Bayesian edge detector performance gives considerable improvement over the Sobel operator performance for rich photos.
    • An application of an integrated population model: estimating population size of the Fortymile caribou herd using limited data

      Inokuma, Megumi; Short, Margaret; Barry, Ron; Goddard, Scott (2017-05)
      An Integrated Population Model (IPM) was employed to estimate the population size of the Fortymile Caribou herd (FCH), utilizing multiple types of biological data. Current population size estimates of the FCH are made by the Alaska Department of Fish and Game (ADF&G) using an aerial photo census technique. Taking aerial photos for the counts requires certain environmental conditions, such as the existence of swarms of mosquitoes that drive the majority of caribou to wide open spaces, as well as favorable weather conditions, which allow low-altitude flying in mid-June. These conditions have not been met in recent years so there is no count estimate for those years. IPMs are considered as alternative methods to estimate a population size. IPMs contain three components: a stochastic component that explains the relationship between biological information and population size; demographic models that derive parameters from independently conducted surveys; and a link between IPM estimates and observed-count estimates. In this paper, we combine census count data, parturition data, calf and female adults survival data, and sex composition data, all of which were collected by ADF&G between 1990 and 2016. During this time period, there were 13 years - including two five-consecutive-year periods - for which no photo census count estimates were available. We estimate the missing counts and the associated uncertainty using a Bayesian IPM. Our case study shows that IPMs are capable of estimating a population size for years with missing count data when we have other biological data. We suggest that sensitivity analyses be done to learn the relationship between amount of data and the accuracy of the estimates.
    • An application of Bayesian variable selection to international economic data

      Tian, Xiang; Goddard, Scott; Barry, Ron; Short, Margaret; McIntyre, Julie (2017-06)
      GDP plays an important role in people's lives. For example, when GDP increases, the unemployment rate will frequently decrease. In this project, we will use four different Bayesian variable selection methods to verify economic theory regarding important predictors to GDP. The four methods are: g-prior variable selection with credible intervals, local empirical Bayes with credible intervals, variable selection by indicator function, and hyper-g prior variable selection. Also, we will use four measures to compare the results of the various Bayesian variable selection methods: AIC, BIC, Adjusted-R squared and cross-validation.
    • Linear partial differential equations and real analytic approximations of rough functions

      Barry, Timothy J.; Rybkin, Alexei; Avdonin, Sergei; Faudree, Jill (2017-08)
      Many common approximation methods exist such as linear or polynomial interpolation, splines, Taylor series, or generalized Fourier series. Unfortunately, many of these approximations are not analytic functions on the entire real line, and those that are diverge at infinity and therefore are only valid on a closed interval or for compactly supported functions. Our method takes advantage of the smoothing properties of certain linear partial differential equations to obtain an approximation which is real analytic, converges to the function on the entire real line, and yields particular conservation laws. This approximation method applies to any L₂ function on the real line which may have some rough behavior such as discontinuities or points of nondifferentiability. For comparison, we consider the well-known Fourier-Hermite series approximation. Finally, for some example functions the approximations are found and plotted numerically.
    • A comparison of discrete inverse methods for determining parameters of an economic model

      Jurkowski, Caleb; Maxwell, David; Short, Margaret; Bueler, Edward (2017-08)
      We consider a time-dependent spatial economic model for capital in which the region's production function is a parameter. This forward model predicts the distribution of capital of a region based on that region's production function. We will solve the inverse problem based on this model, i.e. given data describing the capital of a region we wish to determine the production function through discretization. Inverse problems are generally ill-posed, which in this case means that if the data describing the capital are changed slightly, the solution of the inverse problem could change dramatically. The solution we seek is therefore a probability distribution of parameters. However, this probability distribution is complex, and at best we can describe some of its features. We describe the solution to this inverse problem using two different techniques, Markov chain Monte Carlo (Metropolis Algorithm ) and least squares optimization, and compare summary statistics coming from each method.
    • A study of saturation number

      Burr, Erika; Faudree, Jill; Williams, Gordon; Berman-Williams, Leah (2017-08)
      This paper seeks to provide complete proofs in modern notation of (early) key saturation number results and give some new results concerning the semi-saturation number. We highlight relevant results from extremal theory and present the saturation number for the complete graph Kk; and the star K₁,t, elaborating on the proofs provided in the 1964 paper A Problem in Graph Theory by Erdos, Hajnal and Moon and the 1986 paper Saturated Graphs with Minimal Number of Edges by Kászonyi and Tuza. We discuss the proof of a general bound on the saturation number for a family of target graphs provided by Kászonyi and Tuza. A discussion of related results showing that the complete graph has the maximum saturation number among target graphs of the same order and that the star has the maximum saturation number among target trees of the same order is included. Before presenting our result concerning the semi-saturation number for the path Pk; we discuss the structure of some Pk-saturated trees of large order as well as the saturation number of Pk with respect to host graphs of large order.
    • Extending the Lattice-Based Smoother using a generalized additive model

      Rakhmetova, Gulfaya; McIntyre, Julie; Short, Margaret; Goddard, Scott (2017-12)
      The Lattice Based Smoother was introduced by McIntyre and Barry (2017) to estimate a surface defined over an irregularly-shaped region. In this paper we consider extending their method to allow for additional covariates and non-continuous responses. We describe our extension which utilizes the framework of generalized additive models. A simulation study shows that our method is comparable to the Soap film smoother of Wood et al. (2008), under a number of different conditions. Finally we illustrate the method's practical use by applying it to a real data set.
    • Toward an optimal solver for the obstacle problem

      Heldman, Max; Bueler, Ed; Maxwell, David; Rhodes, John (2018-04)
      An optimal algorithm for solving a problem with m degrees of freedom is one that computes a solution in O (m) time. In this paper, we discuss a class of optimal algorithms for the numerical solution of PDEs called multigrid methods. We go on to examine numerical solvers for the obstacle problem, a constrained PDE, with the goal of demonstrating optimality. We discuss two known algorithms, the so-called reduced space method (RSP) [BM03] and the multigrid-based projected full-approximation scheme (PFAS) [BC83]. We compare the performance of PFAS and RSP on a few example problems, finding numerical evidence of optimality or near-optimality for PFAS.
    • Reliability analysis of reconstructing phylogenies under long branch attraction conditions

      Dissanayake, Ranjan; Allman, Elizabeth; McIntyre, Julie; Short, Margaret; Goddard, Scott (2018-05)
      In this simulation study we examined the reliability of three phylogenetic reconstruction techniques in a long branch attraction (LBA) situation: Maximum Parsimony (M P), Neighbor Joining (NJ), and Maximum Likelihood. Data were simulated under five DNA substitution models-JC, K2P, F81, HKY, and G T R-from four different taxa. Two branch length parameters of four taxon trees ranging from 0.05 to 0.75 with an increment of 0.02 were used to simulate DNA data under each model. For each model we simulated DNA sequences with 100, 250, 500 and 1000 sites with 100 replicates. When we have enough data the maximum likelihood technique is the most reliable of the three methods examined in this study for reconstructing phylogenies under LBA conditions. We also find that MP is the most sensitive to LBA conditions and that Neighbor Joining performs well under LBA conditions compared to MP.
    • Analyzing tree distribution and abundance in Yukon-Charley Rivers National Preserve: developing geostatistical Bayesian models with count data

      Winder, Samantha; Short, Margaret; Roland, Carl; Goddard, Scott; McIntyre, Julie (2018-05)
      Species distribution models (SDMs) describe the relationship between where a species occurs and underlying environmental conditions. For this project, I created SDMs for the five tree species that occur in Yukon-Charley Rivers National Preserve (YUCH) in order to gain insight into which environmental covariates are important for each species, and what effect each environmental condition has on that species' expected occurrence or abundance. I discuss some of the issues involved in creating SDMs, including whether or not to incorporate spatially explicit error terms, and if so, how to do so with generalized linear models (GLMs, which have discrete responses). I ran a total of 10 distinct geostatistical SDMs using Markov Chain Monte Carlo (Bayesian methods), and discuss the results here. I also compare these results from YUCH with results from a similar analysis conducted in Denali National Park and Preserve (DNPP).
    • Testing multispecies coalescent simulators with summary statistics

      Baños Cervantes, Hector Daniel; Allman, Elizabeth; Rhodes, John; Goddard, Scott; McIntyre, Julie; Barry, Ron (2018-12)
      The Multispecies coalescent model (MSC) is increasingly used in phylogenetics to describe the formation of gene trees (depicting the direct ancestral relationships of sampled lineages) within species trees (depicting the branching of species from their common ancestor). A number of MSC simulators have been implemented, and these are often used to test inference methods built on the model. However, it is not clear from the literature that these simulators are always adequately tested. In this project, we formulated tools for testing these simulators and use them to show that of four well-known coalescent simulators, Mesquite, Hybrid-Lambda, SimPhy, and Phybase, only SimPhy performs correctly according to these tests.
    • Estimating confidence intervals on accuracy in classification in machine learning

      Zhang, Jesse; McIntyre, Julie; Barry, Ronald; Goddard, Scott (2019-04)
      This paper explores various techniques to estimate a confidence interval on accuracy for machine learning algorithms. Confidence intervals on accuracy may be used to rank machine learning algorithms. We investigate bootstrapping, leave one out cross validation, and conformal prediction. These techniques are applied to the following machine learning algorithms: support vector machines, bagging AdaBoost, and random forests. Confidence intervals are produced on a total of nine datasets, three real and six simulated. We found in general not any technique was particular successful at always capturing the accuracy. However leave one out cross validation had the most consistency amongst all techniques for all datasets.
    • On the Klein-Gordon equation originating on a curve and applications to the tsunami run-up problem

      Gaines, Jody; Rybkin, Alexei; Bueler, Ed; Nicolsky, Dmitry (2019-05)
      Our goal is to study the linear Klein-Gordon equation in matrix form, with initial conditions originating on a curve. This equation has applications to the Cross-Sectionally Averaged Shallow Water equations, i.e. a system of nonlinear partial differential equations used for modeling tsunami waves within narrow bays, because the general Carrier-Greenspan transform can turn the Cross-Sectionally Averaged Shallow Water equations (for shorelines of constant slope) into a particular form of the matrix Klein-Gordon equation. Thus the matrix Klein-Gordon equation governs the run-up of tsunami waves along shorelines of constant slope. If the narrow bay is U-shaped, the Cross-Sectionally Averaged Shallow Water equations have a known general solution via solving the transformed matrix Klein-Gordon equation. However, the initial conditions for our Klein-Gordon equation are given on a curve. Thus our goal is to solve the matrix Klein-Gordon equation with known conditions given along a curve. Therefore we present a method to extrapolate values on a line from conditions on a curve, via the Taylor formula. Finally, to apply our solution to the Cross-Sectionally Averaged Shallow Water equations, our numerical simulations demonstrate how Gaussian and N-wave profiles affect the run-up of tsunami waves within various U-shaped bays.
    • An exploration of two infinite families of snarks

      Ver Hoef, Lander; Berman, Leah; Williams, Gordon; Faudree, Jill (2019-05)
      In this paper, we generalize a single example of a snark that admits a drawing with even rotational symmetry into two infinite families using a voltage graph construction techniques derived from cyclic Pseudo-Loupekine snarks. We expose an enforced chirality in coloring the underlying 5-pole that generated the known example, and use this fact to show that the infinite families are in fact snarks. We explore the construction of these families in terms of the blowup construction. We show that a graph in either family with rotational symmetry of order m has automorphism group of order m2m⁺¹. The oddness of graphs in both families is determined exactly, and shown to increase linearly with the order of rotational symmetry.
    • A geostatistical model based on Brownian motion to Krige regions in R2 with irregular boundaries and holes

      Bernard, Jordy; McIntyre, Julie; Barry, Ron; Goddard, Scott (2019-05)
      Kriging is a geostatistical interpolation method that produces predictions and prediction intervals. Classical kriging models use Euclidean (straight line) distance when modeling spatial autocorrelation. However, for estuaries, inlets, and bays, shortest-in-water distance may capture the system’s proximity dependencies better than Euclidean distance when boundary constraints are present. Shortest-in-water distance has been used to krige such regions (Little et al., 1997; Rathbun, 1998); however, the variance-covariance matrices used in these models have not been shown to be mathematically valid. In this project, a new kriging model is developed for irregularly shaped regions in R 2 . This model incorporates the notion of flow connected distance into a valid variance-covariance matrix through the use of a random walk on a lattice, process convolutions, and the non-stationary kriging equations. The model developed in this paper is compared to existing methods of spatial prediction over irregularly shaped regions using water quality data from Puget Sound.
    • Species network inference under the multispecies coalescent model

      Baños Cervantes, Hector Daniel; Allman, Elizabeth S.; Rhodes, John A.; Barry, Ronald; Faudree, Jill (2019-05)
      Species network inference is a challenging problem in phylogenetics. In this work, we present two results on this. The first shows that many topological features of a level-1 network are identifable under the network multispecies coalescent model (NMSC). Specifcally, we show that one can identify from gene tree frequencies the unrooted semidirected species network, after suppressing all cycles of size less than 4. The second presents the theory behind a new, statistically consistent, practical method for the inference of level-1 networks under the NMSC. The input for this algorithm is a collection of unrooted topological gene trees, and the output is an unrooted semidirected species network.