Mathematics and Statistics
Recent Submissions

Bayesian cluster analysis to determine institutional peersAcademic peers are a type of institutional peer group often identified for the purpose of assessing an institution’s performance in student success. This project introduces the consideration of a different type of peer group, student population peers, when assessing the performance of the University of Alaska Fairbanks in comparison to its currently identified academic peers. A Bayesian influenced clustering analysis is used to determine student population peer groups; these peer groups are constructed using student metrics data retrieved from the IPEDS data center, thus providing an emphasis on the student body an institution serves. The R package bclust is used to compute our Bayesian cluster analysis. We find that none of the university’s listed academic peers are closely related to the University of Alaska Fairbanks when using student body focused clustering analysis. Using the Bayesian cluster analysis for determining student population peers of the University of Alaska Fairbanks, we allow a more comprehensive discussion of the university’s performance in student success when comparing outcome measures between the university and its current academic peers.

Control and inverse problems for the wave equation on metric graphsThis thesis focuses on control and inverse problems for the wave equation on finite metric graphs. The first part deals with the control problem for the wave equation on tree graphs. We propose new constructive algorithms for solving initial boundary value problems on general graphs and boundary control problems on tree graphs. We demonstrate that the wave equation on a tree is exactly controllable if and only if controls are applied at all or all but one of the boundary vertices. We find the minimal controllability time and prove that our result is optimal in the general case. The second part deals with the inverse problem for the wave equation on tree graphs. We describe the dynamical Leaf Peeling (LP) method. The main step of the method is recalculating the response operator from the original tree to a peeled tree. The LP method allows us to recover the connectivity, potential function on a tree graph and the lengths of its edges from the response operator given on a finite time interval. In the third part we consider the control problem for the wave equation on graphs with cycles. Among all vertices and edges we choose certain active vertices and edges, and give a constructive proof that the wave equation on the graph is exactly controllable if Neumann controllers are placed at the active vertices and Dirichlet controllers are placed at the active edges. The control time for this construction is determined by the chosen orientation and path decomposition of the graph. We indicate the optimal time that guarantees the exact controllability for all systems of a described class on a given graph. While the choice of the active vertices and edges is not unique, we find the minimum number of controllers to guarantee the exact controllability as a graph invariant.

Controllability of nonselfadjoint systems of partial differential equationsIn this dissertation, we first consider the problem of exact controllability of a system of N onedimensional coupled wave equations when the control is exerted on a part of the boundary by means of one control. We provide a Kalman condition (necessary and sufficient) and give a description of the attainable set. The second problem we consider is the inverse problem for the vector Schrödinger equation on the interval with a nonselfadjoint matrix potential. In doing so, we prove controllability of the system and develop a method to recover spectral data from the system. Then, we solve the inverse problem using techniques of the Boundary Control method. The final problem is that of internal null controllability of a beam equation on an interval. We provide a partial characterization for controllability for arbitrary open subsets where the control is applied.

An invitation to gauge theoryWe introduce the audience to the mathematics of gauge theory. We begin by formalizing the intuitive concepts of smoothness, tangency, symmetry, constancy, and parallelism. Building up to a theory of parallel transport in associated fiber bundles, we study principal connections in principal bundles as well as the related notions of curvature and holonomy. In particular, we conclude with a nonabelian Stokes's theorem which recasts holonomy in terms of curvature.

Control problems for the wave and telegrapher's equations on metric graphsThe dissertation focuses on control problems for the wave and telegrapher's equations on metric graphs. In the first part, an algorithm is constructed to solve the exact control problems on finite intervals. The algorithm is implemented numerically to solve the exact control problems on finite intervals. Moreover, we developed numerical algorithms for the solution of control problems on metric graphs based on the recent boundary controllability results of wave equations on metric graphs. We presented numerical solutions to shape control problems on quantum graphs. Specifically, we presented the results of numerical experiments involving a threestar graph. Our second part deals with the forward and control problems for the telegrapher's equations on metric graphs. We consider the forward problem on general graphs and develop an algorithm that solves equations with variable resistance, conductance, constant inductance, and constant capacitance. An algorithm is developed to solve the voltage and current control problems on a finite interval for constant inductance and capacitance, and variable resistance and conductance. Numerical results are also presented for this case. Finally, we consider the control problems for the telegrapher's equations on metric graphs. The control problem is considered on tree graphs, i.e. graphs without cycles, with some restrictions on the coefficients. Specifically, we consider equations with constant coefficients that do not depend on the edge. We obtained the necessary and sufficient conditions of the exact controllability and indicate the minimal control time.

A Bayesian mixed multistate openrobust design markrecapture model to estimate heterogeneity in transition rates in an imperfectly detected systemMultistate markrecapture models have long been used to assess ecological and demographic parameters such as survival, phenology, and breeding rates by estimating transition rates among a series of latent or observable states. Here, we introduce a Bayesian mixed multistate open robust design mark recapture model (MSORD), with random intercepts and slopes to explore individual heterogeneity in transition rates and individual responses to covariates. We fit this model to simulated data sets to test whether the model could accurately and precisely estimate five parameters, set to known values a priori, under varying sampling schemes. To assess the behavior of the model integrated across replicate fits, we employed a twostage hierarchical model fitting algorithm for each of the simulations. The majority of model fits showed no sign of inadequate convergence according to our metrics, with 81.25% of replicate posteriors for parameters of interest having general agreement among chains (r < 1.1). Estimates of posterior distributions for mean transition rates and standard deviation in random intercepts were generally welldefined. However, we found that models estimated the standard deviation in random slopes and the correlation among random effects relatively poorly, especially in simulations with low power to detect individuals (e.g. low detection rates, study duration, or secondary samples). We also apply this model to a dataset of 200 female grey seals breeding on Sable Island from 19852018 to estimate individual heterogeneity in reproductive rate and response to nearexponential population growth. The Bayesian MSORD estimated substantial variation among individuals in both mean transition rates and responses to population size. The correlation among effects trended positively, indicating that females with high reproductive performance (more positive intercept) were also more likely to respond better to population growth (more positive slope) and vice versa. Though our simulation results lend confidence to analyses using this method on well developed datasets on highly observable systems, we caution the use of this framework in sparse data situations.

Analysis of GNAC Volleyball using the BradleyTerry ModelRanking is the process by which a set of objects is assigned a linear ordering based on some property that they possess. Not surprisingly, there are many different methods of ranking used in a wide array of diverse applications; ranking plays a vital role in sports analysis, preference testing, search engine optimization, psychological research, and many other areas. One of the more popular ranking models is BradleyTerry, which is a type of aggregation ranking that has been used mostly within the realm of sports. BradleyTerry uses the outcome of individual matchups (pairedcomparisons) to create rankings using maximumlikelihood estimation. This project aims to briefly examine the motivation for modeling sporting events, review the history of ranking and aggregationranking, communicate the mathematical theory behind the BradleyTerry model, and apply the model to a novel volleyball dataset.

Simulating distance sampling to estimate nest abundance on the YukonKuskokwim Delta, AlaskaThe U.S. Fish and Wildlife Service currently conducts annual surveys to estimate bird nest abundance on the YukonKuskokwim Delta, Alaska. The current method involves intensive searching on large plots with the goal of finding every nest on the plot. Distance sampling is a wellestablished transectbased method to estimate density or abundance that accounts for imperfect detection of objects. It relies on estimating the probability of detecting an object given its distance from the transect line, or the detection function. Simulations were done using R to explore whether distance sampling methods on the YukonKuskokwim Delta could produce reliable estimates of nest abundance. Simulations were executed both with geographic strata based on estimated Spectacled Eider (Somateria fischeri) nest densities and without stratification. Simulations with stratification where more effort was allotted to high density areas tended to be more precise, but lacked the property of pooling robustness and assumed stratum boundaries would not change over time. Simulations without stratification yielded estimates with relatively low bias and variances comparable to current estimation methods. Distance sampling appears to be a viable option for estimating the abundance of nests on the YukonKuskokwim Delta.

Multiple imputation of missing multivariate atmospheric chemistry time series data from Denali National ParkThis paper explores a technique where we impute missing values for an incomplete dataset via multiple imputation. Incomplete data is one of the most common issues in data analysis and often occurs when measuring chemical and environmental data. The dataset that we used in the model consists of 26 atmospheric particulates or elements that were measured semiweekly in Denali National Park from 1988 to 2015. The collection days were alternating between three and four days apart from 3/2/88  9/30/00 and being consistently collected every three days apart from 10/3/00  12/29/15. For this reason, the data were initially partitioned into two in case the separation between collection days would have an impact. With further analysis, we concluded that the misalignments between the two datasets had very little or no impact on our analysis and therefore combined the two. After running five Markov chains of 1000 iterations we concluded that the model stayed consistent between the five chains. We found out that in order to get a better understanding of how well the imputed values did, more exploratory analysis on the imputed datasets would be required.

An exposition on the KroneckerWeber theoremThe KroneckerWeber Theorem is a, classification result from Algebraic Number Theory. Theorem (KroneckerWeber). Every finite, abelian extension of Q is contained in a cyclotomic field. This result was originally proven by Leopold Kronecker in 1853. However, his proof had some gaps that were later filled by Heinrich Martin Weber in 1886 and David Hilbert in 1896. Hilbert's strategy for the proof eventually led to the creation of the field of mathematics called Class Field Theory, which is the study of finite, abelian extensions of arbitrary fields and is still an area of active research. Not only is the KroneckerWeber Theorem surprising, its proof is truly amazing. The idea of the proof is that for a finite, Galois extension K of Q, there is a connection between the Galois group Gal(K/Q) and how primes of Z split in a certain subring R of K corresponding to Z in Q. When Gal(K/Q) is abelian, this connection is so stringent that the only possibility is that K is contained in a cyclotomic field. In this paper, we give an overview of field/Galois theory and what the KroneckerWeber Theorem means. We also talk about the ring of integers R of K, how primes split in R, how splitting of primes is related to the Galois group Gal(K/Q), and finally give a proof of the KroneckerWeber Theorem using these ideas.

Investigations in phylogenetics: tree inference and model identifiabilityThis thesis presents two projects in mathematical phylogenetics. The first presents a new, statistically consistent, fast method for inferring species trees from topological gene trees under the multispecies coalescent model. The algorithm of this method takes a collection of unrooted topological gene trees, computes a novel intertaxon distance from them, and outputs a metric species tree. The second establishes that numerical and nonnumerical parameters of a specic Prole Mixture Model of protein sequence evolution are generically identifiable. Algebraic techniques are used, especially a theorem of Kruskal on tensor decomposition.

Multistate OrnsteinUhlenbeck space use model reveals sexspecific partitioning of the energy landscape in a soaring birdUnderstanding animals’ home range dynamics is a frequent motivating question in movement ecology. Descriptive techniques are often applied, but these methods lack predictive ability and cannot capture effects of dynamic environmental patterns, such as weather and features of the energy landscape. Here, we develop a practical approach for statistical inference into the behavioral mechanisms underlying how habitat and the energy landscape shape animal home ranges. We validated this approach by conducting a simulation study, and applied it to a sample of 12 golden eagles Aquila chrysaetos tracked with satellite telemetry. We demonstrate that readily available software can be used to fit a multistate OrnsteinUhlenbeck space use model to make hierarchical inference of habitat selection parameters and home range dynamics. Additionally, the underlying mathematical properties of the model allow straightforward computation of predicted space use distributions, permitting estimation of home range size and visualization of space use patterns under varying conditions. The application to golden eagles revealed effects of habitat variables that align with eagle biology. Further, we found that males and females partition their home ranges dynamically based on uplift. Specifically, changes in wind and the angle of the sun seemed to be drivers of differential space use between sexes, in particular during late breeding season when both are foraging across large parts of their home range to support nestling growth.

Estimating confidence intervals on accuracy in classification in machine learningThis paper explores various techniques to estimate a confidence interval on accuracy for machine learning algorithms. Confidence intervals on accuracy may be used to rank machine learning algorithms. We investigate bootstrapping, leave one out cross validation, and conformal prediction. These techniques are applied to the following machine learning algorithms: support vector machines, bagging AdaBoost, and random forests. Confidence intervals are produced on a total of nine datasets, three real and six simulated. We found in general not any technique was particular successful at always capturing the accuracy. However leave one out cross validation had the most consistency amongst all techniques for all datasets.

A geostatistical model based on Brownian motion to Krige regions in R2 with irregular boundaries and holesKriging is a geostatistical interpolation method that produces predictions and prediction intervals. Classical kriging models use Euclidean (straight line) distance when modeling spatial autocorrelation. However, for estuaries, inlets, and bays, shortestinwater distance may capture the system’s proximity dependencies better than Euclidean distance when boundary constraints are present. Shortestinwater distance has been used to krige such regions (Little et al., 1997; Rathbun, 1998); however, the variancecovariance matrices used in these models have not been shown to be mathematically valid. In this project, a new kriging model is developed for irregularly shaped regions in R 2 . This model incorporates the notion of flow connected distance into a valid variancecovariance matrix through the use of a random walk on a lattice, process convolutions, and the nonstationary kriging equations. The model developed in this paper is compared to existing methods of spatial prediction over irregularly shaped regions using water quality data from Puget Sound.

Paving the road to college: impacts of Washington State policy on improving equitable participation in dual credit coursesThis dissertation evaluates early impacts of a state policy to increase participation in dual credit courses in Washington state through subsidizing the cost of college credits for underrepresented rural and lowincome students, and through extending eligibility to earn dual credit to students in grade 10. This study evaluates both aspects of the policy, with emphasis on the impacts for underrepresented rural and lowincome students, students of color, and English learners. It employs quasiexperimental designs to estimate the impact of the policy on intended outcomes. The study finds mixed early impacts of the policy. While no effects were found for students attending schools near the cutoffs for eligibility for tuition subsidies, promising evidence emerged on the policy's impact on participation in dual credit among students in grade 10. The findings can provide policymakers with early evidence of the policy's effects, identify places where implementation may be strengthened, and serve as a blueprint for ongoing monitoring of the policy's impact and similar evaluations of dual credit policies nationwide.

An exploration of two infinite families of snarksIn this paper, we generalize a single example of a snark that admits a drawing with even rotational symmetry into two infinite families using a voltage graph construction techniques derived from cyclic PseudoLoupekine snarks. We expose an enforced chirality in coloring the underlying 5pole that generated the known example, and use this fact to show that the infinite families are in fact snarks. We explore the construction of these families in terms of the blowup construction. We show that a graph in either family with rotational symmetry of order m has automorphism group of order m2m⁺¹. The oddness of graphs in both families is determined exactly, and shown to increase linearly with the order of rotational symmetry.

On the KleinGordon equation originating on a curve and applications to the tsunami runup problemOur goal is to study the linear KleinGordon equation in matrix form, with initial conditions originating on a curve. This equation has applications to the CrossSectionally Averaged Shallow Water equations, i.e. a system of nonlinear partial differential equations used for modeling tsunami waves within narrow bays, because the general CarrierGreenspan transform can turn the CrossSectionally Averaged Shallow Water equations (for shorelines of constant slope) into a particular form of the matrix KleinGordon equation. Thus the matrix KleinGordon equation governs the runup of tsunami waves along shorelines of constant slope. If the narrow bay is Ushaped, the CrossSectionally Averaged Shallow Water equations have a known general solution via solving the transformed matrix KleinGordon equation. However, the initial conditions for our KleinGordon equation are given on a curve. Thus our goal is to solve the matrix KleinGordon equation with known conditions given along a curve. Therefore we present a method to extrapolate values on a line from conditions on a curve, via the Taylor formula. Finally, to apply our solution to the CrossSectionally Averaged Shallow Water equations, our numerical simulations demonstrate how Gaussian and Nwave profiles affect the runup of tsunami waves within various Ushaped bays.

Species network inference under the multispecies coalescent modelSpecies network inference is a challenging problem in phylogenetics. In this work, we present two results on this. The first shows that many topological features of a level1 network are identifable under the network multispecies coalescent model (NMSC). Specifcally, we show that one can identify from gene tree frequencies the unrooted semidirected species network, after suppressing all cycles of size less than 4. The second presents the theory behind a new, statistically consistent, practical method for the inference of level1 networks under the NMSC. The input for this algorithm is a collection of unrooted topological gene trees, and the output is an unrooted semidirected species network.

Testing multispecies coalescent simulators with summary statisticsThe Multispecies coalescent model (MSC) is increasingly used in phylogenetics to describe the formation of gene trees (depicting the direct ancestral relationships of sampled lineages) within species trees (depicting the branching of species from their common ancestor). A number of MSC simulators have been implemented, and these are often used to test inference methods built on the model. However, it is not clear from the literature that these simulators are always adequately tested. In this project, we formulated tools for testing these simulators and use them to show that of four wellknown coalescent simulators, Mesquite, HybridLambda, SimPhy, and Phybase, only SimPhy performs correctly according to these tests.