Browsing Mathematics and Statistics by Title
Now showing items 3148 of 48

Multiple imputation of missing multivariate atmospheric chemistry time series data from Denali National ParkThis paper explores a technique where we impute missing values for an incomplete dataset via multiple imputation. Incomplete data is one of the most common issues in data analysis and often occurs when measuring chemical and environmental data. The dataset that we used in the model consists of 26 atmospheric particulates or elements that were measured semiweekly in Denali National Park from 1988 to 2015. The collection days were alternating between three and four days apart from 3/2/88  9/30/00 and being consistently collected every three days apart from 10/3/00  12/29/15. For this reason, the data were initially partitioned into two in case the separation between collection days would have an impact. With further analysis, we concluded that the misalignments between the two datasets had very little or no impact on our analysis and therefore combined the two. After running five Markov chains of 1000 iterations we concluded that the model stayed consistent between the five chains. We found out that in order to get a better understanding of how well the imputed values did, more exploratory analysis on the imputed datasets would be required.

Multistate OrnsteinUhlenbeck space use model reveals sexspecific partitioning of the energy landscape in a soaring birdUnderstanding animals’ home range dynamics is a frequent motivating question in movement ecology. Descriptive techniques are often applied, but these methods lack predictive ability and cannot capture effects of dynamic environmental patterns, such as weather and features of the energy landscape. Here, we develop a practical approach for statistical inference into the behavioral mechanisms underlying how habitat and the energy landscape shape animal home ranges. We validated this approach by conducting a simulation study, and applied it to a sample of 12 golden eagles Aquila chrysaetos tracked with satellite telemetry. We demonstrate that readily available software can be used to fit a multistate OrnsteinUhlenbeck space use model to make hierarchical inference of habitat selection parameters and home range dynamics. Additionally, the underlying mathematical properties of the model allow straightforward computation of predicted space use distributions, permitting estimation of home range size and visualization of space use patterns under varying conditions. The application to golden eagles revealed effects of habitat variables that align with eagle biology. Further, we found that males and females partition their home ranges dynamically based on uplift. Specifically, changes in wind and the angle of the sun seemed to be drivers of differential space use between sexes, in particular during late breeding season when both are foraging across large parts of their home range to support nestling growth.

NonNormality In Scalar Delay Differential EquationsAnalysis of stability for delay differential equations (DDEs) is a tool in a variety of fields such as nonlinear dynamics in physics, biology, and chemistry, engineering and pure mathematics. Stability analysis is based primarily on the eigenvalues of a discretized system. Situations exist in which practical and numerical results may not match expected stability inferred from such approaches. The reasons and mechanisms for this behavior can be related to the eigenvectors associated with the eigenvalues. When the operator associated to a linear (or linearized) DDE is significantly nonnormal, the stability analysis must be adapted as demonstrated here. Example DDEs are shown to have solutions which exhibit transient growth not accounted for by eigenvalues alone. Pseudospectra are computed and related to transient growth.

Numerical realization of the generalized CarrierGreenspan Transform for the shallow water wave equationsWe study the development of two numerical algorithms for long nonlinear wave runup that utilize the generalized CarrierGreenspan transform. The CarrierGreenspan transform is a hodograph transform that allows the Shallow Water Wave equations to be transformed into a linear second order wave equation with nonconstant coefficients. In both numerical algorithms the transform is numerically implemented, the resulting linear system is numerically solved and then the inverse transformation is implemented. The first method we develop is based on an implicit finite difference method and is applicable to constantly sloping bays of arbitrary crosssection. The resulting scheme is extremely fast and shows promise as a fast tsunami runup solver for wave runup in coastal fjords and narrow inlets. For the second scheme, we develop an initial value boundary problem corresponding to an Inclined bay with U or V shaped crosssections that has a wall some distance from the shore. A spectral method is applied to the resulting linear equation in order to and a series solution. Both methods are verified against an analytical solution in an inclined parabolic bay with positive results and the first scheme is compared to the 3D numerical solver FUNWAVE with positive results.

On the KleinGordon equation originating on a curve and applications to the tsunami runup problemOur goal is to study the linear KleinGordon equation in matrix form, with initial conditions originating on a curve. This equation has applications to the CrossSectionally Averaged Shallow Water equations, i.e. a system of nonlinear partial differential equations used for modeling tsunami waves within narrow bays, because the general CarrierGreenspan transform can turn the CrossSectionally Averaged Shallow Water equations (for shorelines of constant slope) into a particular form of the matrix KleinGordon equation. Thus the matrix KleinGordon equation governs the runup of tsunami waves along shorelines of constant slope. If the narrow bay is Ushaped, the CrossSectionally Averaged Shallow Water equations have a known general solution via solving the transformed matrix KleinGordon equation. However, the initial conditions for our KleinGordon equation are given on a curve. Thus our goal is to solve the matrix KleinGordon equation with known conditions given along a curve. Therefore we present a method to extrapolate values on a line from conditions on a curve, via the Taylor formula. Finally, to apply our solution to the CrossSectionally Averaged Shallow Water equations, our numerical simulations demonstrate how Gaussian and Nwave profiles affect the runup of tsunami waves within various Ushaped bays.

Phylogenetic trees and Euclidean embeddingsIn this thesis we develop an intuitive process of encoding any phylogenetic tree and its associated treedistance matrix as a collection of points in Euclidean space. Using this encoding, we find that information about the structure of the tree can easily be recovered by applying the inner product operation to vector combinations of the Euclidean points. By applying Classical Scaling to the treedistance matrix, we are able to find the Euclidean points even when the phylogenetic tree is not known. We use the insight gained by encoding the tree as a collection of Euclidean points to modify the Neighbor Joining Algorithm, a method to recover an unknown phylogenetic tree from its treedistance matrix, to be more resistant to treedistance proportional errors.

Reliability analysis of reconstructing phylogenies under long branch attraction conditionsIn this simulation study we examined the reliability of three phylogenetic reconstruction techniques in a long branch attraction (LBA) situation: Maximum Parsimony (M P), Neighbor Joining (NJ), and Maximum Likelihood. Data were simulated under five DNA substitution modelsJC, K2P, F81, HKY, and G T Rfrom four different taxa. Two branch length parameters of four taxon trees ranging from 0.05 to 0.75 with an increment of 0.02 were used to simulate DNA data under each model. For each model we simulated DNA sequences with 100, 250, 500 and 1000 sites with 100 replicates. When we have enough data the maximum likelihood technique is the most reliable of the three methods examined in this study for reconstructing phylogenies under LBA conditions. We also find that MP is the most sensitive to LBA conditions and that Neighbor Joining performs well under LBA conditions compared to MP.

Simulating distance sampling to estimate nest abundance on the YukonKuskokwim Delta, AlaskaThe U.S. Fish and Wildlife Service currently conducts annual surveys to estimate bird nest abundance on the YukonKuskokwim Delta, Alaska. The current method involves intensive searching on large plots with the goal of finding every nest on the plot. Distance sampling is a wellestablished transectbased method to estimate density or abundance that accounts for imperfect detection of objects. It relies on estimating the probability of detecting an object given its distance from the transect line, or the detection function. Simulations were done using R to explore whether distance sampling methods on the YukonKuskokwim Delta could produce reliable estimates of nest abundance. Simulations were executed both with geographic strata based on estimated Spectacled Eider (Somateria fischeri) nest densities and without stratification. Simulations with stratification where more effort was allotted to high density areas tended to be more precise, but lacked the property of pooling robustness and assumed stratum boundaries would not change over time. Simulations without stratification yielded estimates with relatively low bias and variances comparable to current estimation methods. Distance sampling appears to be a viable option for estimating the abundance of nests on the YukonKuskokwim Delta.

Species network inference under the multispecies coalescent modelSpecies network inference is a challenging problem in phylogenetics. In this work, we present two results on this. The first shows that many topological features of a level1 network are identifable under the network multispecies coalescent model (NMSC). Specifcally, we show that one can identify from gene tree frequencies the unrooted semidirected species network, after suppressing all cycles of size less than 4. The second presents the theory behind a new, statistically consistent, practical method for the inference of level1 networks under the NMSC. The input for this algorithm is a collection of unrooted topological gene trees, and the output is an unrooted semidirected species network.

Statistical analysis of species tree inferenceIt is known that the STAR and USTAR algorithms are statistically consistent techniques used to infer species tree topologies from a large set of gene trees. However, if the set of gene trees is small, the accuracy of STAR and USTAR in determining species tree topologies is unknown. Furthermore, it is unknown how introducing roots on the gene trees affects the performance of STAR and USTAR. Therefore, we show that when given a set of gene trees of sizes 1, 3, 6 or 10, the STAR and USTAR algorithms with Neighbor Joining perform relatively well for two different cases: one where the gene trees are rooted at the outgroup and the STAR inferred species tree is also rooted at the outgroup, and the other where the gene trees are not rooted at the outgroup, but the USTAR inferred species tree is rooted at the outgroup. It is known that the STAR and USTAR algorithms are statistically consistent techniques used to infer species tree topologies from a large set of gene trees. However, if the set of gene trees is small, the accuracy of STAR and USTAR in determining species tree topologies is unknown. Furthermore, it is unknown how introducing roots on the gene trees affects the performance of STAR and USTAR. Therefore, we show that when given a set of gene trees of sizes 1, 3, 6 or 10, the STAR and USTAR algorithms with Neighbor Joining perform relatively well for two different cases: one where the gene trees are rooted at the outgroup and the STAR inferred species tree is also rooted at the outgroup, and the other where the gene trees are not rooted at the outgroup, but the USTAR inferred species tree is rooted at the outgroup.

A study of saturation numberThis paper seeks to provide complete proofs in modern notation of (early) key saturation number results and give some new results concerning the semisaturation number. We highlight relevant results from extremal theory and present the saturation number for the complete graph Kk; and the star K₁,t, elaborating on the proofs provided in the 1964 paper A Problem in Graph Theory by Erdos, Hajnal and Moon and the 1986 paper Saturated Graphs with Minimal Number of Edges by Kászonyi and Tuza. We discuss the proof of a general bound on the saturation number for a family of target graphs provided by Kászonyi and Tuza. A discussion of related results showing that the complete graph has the maximum saturation number among target graphs of the same order and that the star has the maximum saturation number among target trees of the same order is included. Before presenting our result concerning the semisaturation number for the path Pk; we discuss the structure of some Pksaturated trees of large order as well as the saturation number of Pk with respect to host graphs of large order.

Testing multispecies coalescent simulators with summary statisticsThe Multispecies coalescent model (MSC) is increasingly used in phylogenetics to describe the formation of gene trees (depicting the direct ancestral relationships of sampled lineages) within species trees (depicting the branching of species from their common ancestor). A number of MSC simulators have been implemented, and these are often used to test inference methods built on the model. However, it is not clear from the literature that these simulators are always adequately tested. In this project, we formulated tools for testing these simulators and use them to show that of four wellknown coalescent simulators, Mesquite, HybridLambda, SimPhy, and Phybase, only SimPhy performs correctly according to these tests.

The linear algebra of interpolation with finite applications giving computational methods for multivariate polynomialsLinear representation and the duality of the biorthonormality relationship express the linear algebra of interpolation by way of the evaluation mapping. In the finite case the standard bases relate the maps to Gramian matrices. Five equivalent conditions on these objects are found which characterize the solution of the interpolation problem. This algebra succinctly describes the solution space of ordinary linear initial value problems. Multivariate polynomial spaces and multidimensional node sets are described by multiindex sets. Geometric considerations of normalization and dimensionality lead to cardinal bases for Lagrange interpolation on regular node sets. More general Hermite functional sets can also be solved by generalized Newton methods using geometry and multiindices. Extended to countably infinite spaces, the method calls upon theorems of modern analysis.

Toward an optimal solver for the obstacle problemAn optimal algorithm for solving a problem with m degrees of freedom is one that computes a solution in O (m) time. In this paper, we discuss a class of optimal algorithms for the numerical solution of PDEs called multigrid methods. We go on to examine numerical solvers for the obstacle problem, a constrained PDE, with the goal of demonstrating optimality. We discuss two known algorithms, the socalled reduced space method (RSP) [BM03] and the multigridbased projected fullapproximation scheme (PFAS) [BC83]. We compare the performance of PFAS and RSP on a few example problems, finding numerical evidence of optimality or nearoptimality for PFAS.

The treatment of missing data on placement tools for predicting success in college algebra at the University of AlaskaThis project investigated the statistical significance of baccalaureate student placement tools such as tests scores and completion of a developmental course on predicting success in a college level algebra course at the University of Alaska (UA). Students included in the study had attempted Math 107 at UA for the first time between fiscal years 2007 and 2012. The student placement information had a high percentage of missing data. A simulation study was conducted to choose the best missing data method between complete case deletion, and multiple imputation for the student data. After the missing data methods were applied, a logistic regression with fitted with explanatory variables consisting of tests scores, developmental course grade, age (category) of scores and grade, and interactions. The relevant tests were SAT math, ACT math, AccuPlacer college level math, and the relevant developmental course was Devm /Math 105. The response variable was success in passing Math 107 with grade of C or above on the first attempt. The simulation study showed that under a high percentage of missing data and correlation, multiple imputation implemented by the R package Multivariate Imputation by Chained Equations (MICE) produced the least biased estimators and better confidence interval coverage compared to complete cases deletion when data are missing at random (MAR) and missing not at random (MNAR). Results from multiple imputation method on the student data showed that Devm /Math 105 grade was a significant predictor of passing Math 107. The age of Devm /Math 105, age of tests, and test scores were not significant predictors of student success in Math 107. Future studies may consider modeling with ALEKS scores, and high school math course information.

Tsunami runup in U and V shaped baysTsunami runup can be effectively modeled using the shallow water wave equations. In 1958 Carrier and Greenspan in their work "Water waves of finite amplitude on a sloping beach" used this system to model tsunami runup on a uniformly sloping plane beach. They linearized this problem using a hodograph type transformation and obtained the KleinGordon equation which could be explicitly solved by using the FourierBessel transform. In 2011, Efim Pelinovsky and Ira Didenkulova in their work "Runup of Tsunami Waves in UShaped Bays" used a similar hodograph type transformation and linearized the tsunami problem for a sloping bay with parabolic crosssection. They solved the linear system by using the D'Alembert formula. This method was generalized to sloping bays with crosssections parameterized by power functions. However, an explicit solution was obtained only for the case of a bay with a quadratic crosssection. In this paper we will show that the KleinGordon equation can be solved by a spectral method for any inclined bathymetry with power function for any positive power. The result can be used to estimate tsunami runup in such bays with minimal numerical computations. This fact is very important because in many cases our numerical model can be substituted for fullscale numerical models which are computationally expensive, and time consuming, and not feasible to investigate tsunami behavior in the Alaskan coastal zone, due to the low population density in this area

Vertex arboricity of trianglefree graphsThe vertex arboricity of a graph is the minimum number of colors needed to color the vertices so that the subgraph induced by each color class is a forest. In other words, the vertex arboricity of a graph is the fewest number of colors required in order to color a graph such that every cycle has at least two colors. Although not standard, we will refer to vertex arboricity simply as arboricity. In this paper, we discuss properties of chromatic number and kdefective chromatic number and how those properties relate to the arboricity of trianglefree graphs. In particular, we find bounds on the minimum order of a graph having arboricity three. Equivalently, we consider the largest possible vertex arboricity of trianglefree graphs of fixed order.