Mathematics and Statistics
Recent Submissions

An intertaxon dissimilarity for level1 phylogenetic networks from rooted triplesAn important problem in evolutionary biology is how to infer the evolutionary history of a set of taxa in the presence of hybridization. In this work, we present an important first step in the establishment of an algorithm for the inference of a level1 species network under the Network Multispecies Coalescent Model using rooted triples. In particular, we define a new intertaxon dissimilarity, dRT , which can be directly computed from the topology of the rooted triples displayed on a network, and might be estimated from data. In our main result we prove that this dissimilarity is equivalent to an intertaxon splits dissimilarity, dλRT , which can be used to reconstruct a level1 species network using the NeighborNet and Circular Network Algorithms.

Estimation of caribou herd size using the Rivest estimator with Monte CarloEstimating caribou herd size from aerial telemetry surveys is predominantly achieved through the use of the Rivest estimator. Despite the underlying assumptions of the Rivest estimator, it is being used to estimate caribou herd sizes in Alaska and the northern Canadian provinces. The goal of this paper is to create simulated herds as a way to see how well the Rivest estimator works. In an attempt to examine how well this estimator works through the homogeneous model of the Phase II sampling, we estimate the herd sizes of caribou herd groups from the Western Arctic, Teshekpuk, and Mulchatna herds in Alaska by simulating group sizes and detected collars from the truncated continuous power law and the Poisson distributions. The multinomial distribution was used during the simulation to ensure that the assumption of random mixing of collared caribou amongst the rest of the population is met. It was found that the Rivest estimate of the herd size closely matched between the original and simulated data over 80% of the time, hence, can conclude that the estimator works with simulated herds.

Active learning and corequisite instruction in calculus I: a preliminary analysisAn experiment examined the effectiveness of corequisite precalculus support and active learning modifications on success rates in an institution’s Calculus 1 course. Proctored ALEKS testing early in both control and experimental semesters provides a strong method of standardizing students’ precalculus content knowledge. The experimental effect was found to significantly increase the odds of passing the overall Calculus 1 course and passing the Calculus 1 final exam. Early comparison suggests the experimental effect led to an increase in Calculus 2 enrollment, but no substantial impact on Calculus 2 success rates was found. Comparison of changes in ALEKS scores over the semester for Calculus 1 students showed little evidence of change in precalculus content knowledge. Overall, the pedagogical changes implemented led to overall higher pass rates in Calculus 1.

A novel method for simulation of telemetry data based on Canadian LynxIn movement ecology, one frequently encounters situations in which a test statistic is easy to define but its distribution is difficult or impossible to compute in closed form. As such, it is of interest to find methods for simulating data which can be used to approximate the null distribution of such test statistics. In this paper, we describe a motivating scenario for simulating data involving Canadian lynx collared by researchers in the Alaskan arctic. We initially use a hidden Markov model (HMM) to model the behavioral patterns of these animals, and use kernel density estimation to describe their usage distributions. We then describe a novel method for simulating animal tracks based on these telemetry data, which closely preserves the HMM and kernel density estimate (KDE) while removing any causal dependency between them. Finally, we apply this method to identify relationships between an individual’s behavioral state and location within its home range.

Computing prime factorizations with neural networksWhen dealing with sufficiently large integers, even the most cuttingedge existing algorithms for computing prime factorizations are impractically slow. In this paper, we explore the possibility of using neural networks to approximate prime factorizations in the hopes of providing an alternative factorization method which trades accuracy for speed. Due to the intrinsic difficulty associated with this task, the focus of this paper is largely concentrated on the obstacles encountered in the training of the neural net, rather than on the viability of the method itself.

Performance of Gaussian Naïve Bayes for classification with dependencies from Archemedian copulaNaive Bayes is an application of Bayes theorem in which the likelihood function is factored into marginals by making the assumption that the variables are independent. Naive Bayes is typically used for classification problems in which the goal is to find the class with the largest probability given the data on hand. When the data on hand are continuous real numbers we can further assume they are class conditionally normally distributed, which is a particular version of Naive Bayes called Gaussian Naive Bayes. This paper explores when Gaussian Naive Bayes classification problems work well vs when they do not. Typically when assumptions are not valid, valid conclusions cannot be drawn. However, Naive Bayes is known to be robust even when the independence assumption is not met. We show using simulations that binary classification accuracy of Naive Bayes is much more sensitive to differences in the class conditional marginal distributions than the correlation between predictors. Additionally we show that Naive Bayes completely fails when predictors are generated using a Gumbel copula and compare results with a general Bayes classifier and the KNearest Neighbors classifier.

Testing assumptions and bias of a caribou population estimator through Monte Carlo simulationThe Rivest method is the standard way to estimate caribou herd sizes in Alaska and the northern Canadian provinces. Biologists employ radio telemetry to detect discrete groups that make up the wider herd; the Rivest estimator provides an approximate herd size by enumerating the collared and uncollared animals within each group. A key assumption of this technique is that collared caribou mix randomly amongst the wider herd. In this report I scrutinize the accuracy of the Rivest estimator and evaluate three competing hypothesis tests for testing its randommixing assumption under simulated conditions. The Fisher’s Exact Test is the optimal test for detecting violations of randommixing. I found the Rivest method underestimates caribou herd size in simulations where the randommixing assumption was violated to a large degree.

Exploratory analysis of avian point pattern data: approximating methods of intensity on airfield habitat of interior AlaskaThe Animal and Plant Health Inspection Service agency of the United States Department of Agriculture began their work on United States Army Garrison Fort Wainwright of Alaska in 2018. In conjunction with airfield personnel, the main objective of this agency is to protect aircraft on Ladd Army Airfield (LAAF) from wildlife hazards and mitigate humanwildlife interactions on Post. The main wildlife hazard for aircraft is of the avian variety. The patterns of avian use on LAAF were examined for the first time using various nonparametric and parametric spatial methods. The main nonparametric technique applied was kernel density estimation of points in twodimensional contour plots and threedimensional surfaces. As for parametric means, Poisson point process modeling was used to estimate intensity (points per unit area) of the spatial region in question. Each year displayed a unique pattern of use among density plots that were consistent with an inhomogeneous process upon tests of complete spatial randomness. The baseline estimated intensity (homogeneous process) for years 2018, 2019, and 2020 were 1.609, 0.986, and 1.450 observations per hectare, respectively. Spatial locations as covariates revealed that intensity varies in NorthSouth or EastWest directions depending on the year. In addition to fleshing the dataset at hand, I outline theory and steps taken to numerically approximate the likelihood of the inhomogeneous Poisson point process. Logistic regression of observations on a continuous covariate (minimum distance to water) was used to demonstrate that fine pixel approximation yields adequate estimates of intensity.

Population estimates of brown bears near Yakutat, Alaska using a Bayesian integrated population modelThis paper discusses the specification of an Integrated Population Model (IPM) for the brown bear (Ursus arctos) population within and near Yakutat, Alaska. A Bayesian analysis is used in this paper in conjunction with Markov Chain Monte Carlo in order to produce the results. The goal of this project is to estimate the number of brown bears over multiple years. This project was made possible in collaboration with the Alaska Department of Fish and Game (ADF&G) Department of Wildlife Conservation through Anthony Crupi and Jason Waite.

Implementing a proctored ALEKSbased adaptive learning strategy to evaluate student preparedness and predict success in Calculus I at the University of Alaska FairbanksANKSIn Fall 2016, the Department of Mathematics and Statistics (DMS) at the University of Alaska Fairbanks (UAF) initiated a study to determine if the Assessment and LEarning in Knowledge Spaces Placement, Preparation, and Learning test (ALEKS PPL), given under proctored conditions, can be an effective tool for measuring student preparedness and predicting success in Calculus I while controlling for enrollment and demographic information. The study includes 583 students who took a Calculus I course in Fall 2016, Fall 2017, Spring 2018, Fall 2018, Spring 2019, and Spring 2020 semesters and took a Proctored ALEKS PPL test (PAPL). Of the 583 students, 301(52%) students obtained at least a score of 75 on the PAPL test, and 338 (58%) were successful in the Calculus I course. The average score on the PAPL test was 13 points higher for the students who had been successful in Calculus I (P < 0.0001, Welch’s ttest). Logistic regression showed that each additional score on the PAPL test was associated to increase the odds of success in Calculus I by a factor of 1.0843, or 8.43%, when all other factors were fixed (95% CI: 1.07  1.1, P < 0.0001). This study recommends implementing the PAPL test as an adaptive learning tool and a requirement for Calculus I at UAF and establishing a studentcentered standard to address the student knowledge gap in precalculus.

Introduction to exponential random graph modelsExponential random graph models (ERGMs) are used for analyzing network data for a variety of applications. Vertices, or nodes, represent entities, and edges, or ties, represent connections between entities. The ERGM model allows for a representation of edges in structures (from lone edges to triangles and cycles) as an exponential family random variable, a known family of distributions with known properties, such as showing statistics to be complete or sufficient by viewing the distribution. This paper provides an introduction to the topic with both theoretical and applied information, starting with an introduction to the necessary graph theory, graph structures, and theoretical background for fitting models, then moves on to worked examples using the Statnet package.

Bayesian cluster analysis to determine institutional peersAcademic peers are a type of institutional peer group often identified for the purpose of assessing an institution’s performance in student success. This project introduces the consideration of a different type of peer group, student population peers, when assessing the performance of the University of Alaska Fairbanks in comparison to its currently identified academic peers. A Bayesian influenced clustering analysis is used to determine student population peer groups; these peer groups are constructed using student metrics data retrieved from the IPEDS data center, thus providing an emphasis on the student body an institution serves. The R package bclust is used to compute our Bayesian cluster analysis. We find that none of the university’s listed academic peers are closely related to the University of Alaska Fairbanks when using student body focused clustering analysis. Using the Bayesian cluster analysis for determining student population peers of the University of Alaska Fairbanks, we allow a more comprehensive discussion of the university’s performance in student success when comparing outcome measures between the university and its current academic peers.

Control and inverse problems for the wave equation on metric graphsThis thesis focuses on control and inverse problems for the wave equation on finite metric graphs. The first part deals with the control problem for the wave equation on tree graphs. We propose new constructive algorithms for solving initial boundary value problems on general graphs and boundary control problems on tree graphs. We demonstrate that the wave equation on a tree is exactly controllable if and only if controls are applied at all or all but one of the boundary vertices. We find the minimal controllability time and prove that our result is optimal in the general case. The second part deals with the inverse problem for the wave equation on tree graphs. We describe the dynamical Leaf Peeling (LP) method. The main step of the method is recalculating the response operator from the original tree to a peeled tree. The LP method allows us to recover the connectivity, potential function on a tree graph and the lengths of its edges from the response operator given on a finite time interval. In the third part we consider the control problem for the wave equation on graphs with cycles. Among all vertices and edges we choose certain active vertices and edges, and give a constructive proof that the wave equation on the graph is exactly controllable if Neumann controllers are placed at the active vertices and Dirichlet controllers are placed at the active edges. The control time for this construction is determined by the chosen orientation and path decomposition of the graph. We indicate the optimal time that guarantees the exact controllability for all systems of a described class on a given graph. While the choice of the active vertices and edges is not unique, we find the minimum number of controllers to guarantee the exact controllability as a graph invariant.

Controllability of nonselfadjoint systems of partial differential equationsIn this dissertation, we first consider the problem of exact controllability of a system of N onedimensional coupled wave equations when the control is exerted on a part of the boundary by means of one control. We provide a Kalman condition (necessary and sufficient) and give a description of the attainable set. The second problem we consider is the inverse problem for the vector Schrödinger equation on the interval with a nonselfadjoint matrix potential. In doing so, we prove controllability of the system and develop a method to recover spectral data from the system. Then, we solve the inverse problem using techniques of the Boundary Control method. The final problem is that of internal null controllability of a beam equation on an interval. We provide a partial characterization for controllability for arbitrary open subsets where the control is applied.

An invitation to gauge theoryWe introduce the audience to the mathematics of gauge theory. We begin by formalizing the intuitive concepts of smoothness, tangency, symmetry, constancy, and parallelism. Building up to a theory of parallel transport in associated fiber bundles, we study principal connections in principal bundles as well as the related notions of curvature and holonomy. In particular, we conclude with a nonabelian Stokes's theorem which recasts holonomy in terms of curvature.

Control problems for the wave and telegrapher's equations on metric graphsThe dissertation focuses on control problems for the wave and telegrapher's equations on metric graphs. In the first part, an algorithm is constructed to solve the exact control problems on finite intervals. The algorithm is implemented numerically to solve the exact control problems on finite intervals. Moreover, we developed numerical algorithms for the solution of control problems on metric graphs based on the recent boundary controllability results of wave equations on metric graphs. We presented numerical solutions to shape control problems on quantum graphs. Specifically, we presented the results of numerical experiments involving a threestar graph. Our second part deals with the forward and control problems for the telegrapher's equations on metric graphs. We consider the forward problem on general graphs and develop an algorithm that solves equations with variable resistance, conductance, constant inductance, and constant capacitance. An algorithm is developed to solve the voltage and current control problems on a finite interval for constant inductance and capacitance, and variable resistance and conductance. Numerical results are also presented for this case. Finally, we consider the control problems for the telegrapher's equations on metric graphs. The control problem is considered on tree graphs, i.e. graphs without cycles, with some restrictions on the coefficients. Specifically, we consider equations with constant coefficients that do not depend on the edge. We obtained the necessary and sufficient conditions of the exact controllability and indicate the minimal control time.

A Bayesian mixed multistate openrobust design markrecapture model to estimate heterogeneity in transition rates in an imperfectly detected systemMultistate markrecapture models have long been used to assess ecological and demographic parameters such as survival, phenology, and breeding rates by estimating transition rates among a series of latent or observable states. Here, we introduce a Bayesian mixed multistate open robust design mark recapture model (MSORD), with random intercepts and slopes to explore individual heterogeneity in transition rates and individual responses to covariates. We fit this model to simulated data sets to test whether the model could accurately and precisely estimate five parameters, set to known values a priori, under varying sampling schemes. To assess the behavior of the model integrated across replicate fits, we employed a twostage hierarchical model fitting algorithm for each of the simulations. The majority of model fits showed no sign of inadequate convergence according to our metrics, with 81.25% of replicate posteriors for parameters of interest having general agreement among chains (r < 1.1). Estimates of posterior distributions for mean transition rates and standard deviation in random intercepts were generally welldefined. However, we found that models estimated the standard deviation in random slopes and the correlation among random effects relatively poorly, especially in simulations with low power to detect individuals (e.g. low detection rates, study duration, or secondary samples). We also apply this model to a dataset of 200 female grey seals breeding on Sable Island from 19852018 to estimate individual heterogeneity in reproductive rate and response to nearexponential population growth. The Bayesian MSORD estimated substantial variation among individuals in both mean transition rates and responses to population size. The correlation among effects trended positively, indicating that females with high reproductive performance (more positive intercept) were also more likely to respond better to population growth (more positive slope) and vice versa. Though our simulation results lend confidence to analyses using this method on well developed datasets on highly observable systems, we caution the use of this framework in sparse data situations.

Analysis of GNAC volleyball using the BradleyTerry ModelRanking is the process by which a set of objects is assigned a linear ordering based on some property that they possess. Not surprisingly, there are many different methods of ranking used in a wide array of diverse applications; ranking plays a vital role in sports analysis, preference testing, search engine optimization, psychological research, and many other areas. One of the more popular ranking models is BradleyTerry, which is a type of aggregation ranking that has been used mostly within the realm of sports. BradleyTerry uses the outcome of individual matchups (pairedcomparisons) to create rankings using maximumlikelihood estimation. This project aims to briefly examine the motivation for modeling sporting events, review the history of ranking and aggregationranking, communicate the mathematical theory behind the BradleyTerry model, and apply the model to a novel volleyball dataset.

Simulating distance sampling to estimate nest abundance on the YukonKuskokwim Delta, AlaskaThe U.S. Fish and Wildlife Service currently conducts annual surveys to estimate bird nest abundance on the YukonKuskokwim Delta, Alaska. The current method involves intensive searching on large plots with the goal of finding every nest on the plot. Distance sampling is a wellestablished transectbased method to estimate density or abundance that accounts for imperfect detection of objects. It relies on estimating the probability of detecting an object given its distance from the transect line, or the detection function. Simulations were done using R to explore whether distance sampling methods on the YukonKuskokwim Delta could produce reliable estimates of nest abundance. Simulations were executed both with geographic strata based on estimated Spectacled Eider (Somateria fischeri) nest densities and without stratification. Simulations with stratification where more effort was allotted to high density areas tended to be more precise, but lacked the property of pooling robustness and assumed stratum boundaries would not change over time. Simulations without stratification yielded estimates with relatively low bias and variances comparable to current estimation methods. Distance sampling appears to be a viable option for estimating the abundance of nests on the YukonKuskokwim Delta.

Multiple imputation of missing multivariate atmospheric chemistry time series data from Denali National ParkThis paper explores a technique where we impute missing values for an incomplete dataset via multiple imputation. Incomplete data is one of the most common issues in data analysis and often occurs when measuring chemical and environmental data. The dataset that we used in the model consists of 26 atmospheric particulates or elements that were measured semiweekly in Denali National Park from 1988 to 2015. The collection days were alternating between three and four days apart from 3/2/88  9/30/00 and being consistently collected every three days apart from 10/3/00  12/29/15. For this reason, the data were initially partitioned into two in case the separation between collection days would have an impact. With further analysis, we concluded that the misalignments between the two datasets had very little or no impact on our analysis and therefore combined the two. After running five Markov chains of 1000 iterations we concluded that the model stayed consistent between the five chains. We found out that in order to get a better understanding of how well the imputed values did, more exploratory analysis on the imputed datasets would be required.