
Undergraduate Projects
Below is a detailed compilation of some of my favorite statistics projects I completed while at Carleton College. These projects are products of the following classes I took at Carleton:
-
Senior Thesis
-
Spatial Statistics
-
Sampling in Statistics
-
Statistical Inference
-
Applied Linear Regression
-
Data Science
-
Introduction to Statistics

A Comparison of Spatial Models Incorporating Nonspatial Information, with a Policing Case Study
For my senior thesis, with my peers Sammi Sheridan, Miles Frisch, and Evan Christensen, we compared spatial models incorporating nonspatial information, with a policing case study.
**This paper won 1st place in the 2024 Spring USRESP Competition. An undergraduate statistics research competition sponsored by the American Statistical Association.
The paper can be found and downloaded here:

An Analysis of COVID-19 Vaccination Rates as of April 2022 in Colorado and Minnesota
For this project, I analyzed COVID-19 vaccination rates as of April 2022 in Colorado and Minnesota. I explored how differing vaccination policies in Colorado and Minnesota result in differing impacts of socio-economic factors and voting patterns on vaccination rates. To execute this analysis, I tested for complete spatial randomness in the residuals of linear regression models and then fit spatial regression models with the percent of people fully vaccinated as the response variable.
The paper can be found and downloaded here:

Investigating the Use of Alcohol and Marijuana in the United States
With my peer Adriana Wiggins, we estimated the rates of alcohol and marijuana use, in an attempt to investigate how the stigmas surrounding these substances affect their popularity. To complete this analysis, we used data from the 2021 National Survey on Drug Use and Health.
The paper and R Appendix can be found and downloaded here:

How Many Words Starting with "h" Does Helen Know? -- Utilizing Ratio Estimation to Work With Two Variability Components
With my peer Sammi Sheridan, we utilized simple random sampling to estimate the total number of words starting with an "h" that I know. We then utilized Ratio estimation to estimate the proportion of words starting with an "h" that I know.
The paper and R Appendix can be found and downloaded here:

Estimating Variance for the Distribution of Zebra Mussles in Lake Bergen
With my peer Kaitlyn Peterson, we compared the MOM and MLE estimator for sigma^2. After determining that the MLE estimator is more accurate and less variable, we used it to estimate the distribution of zebra mussels in Lake Bergen.
The paper and R Appendix can be found and downloaded here:

Modeling the Association Between Low-Income and Minority Students on Graduation Rate in California High School Districts
With my peers Ben Griesel and Elena Ea, we analyzed the association between low-income and minority students on graduation rate in California high school districts. To investigate this relationship, we fit a multiple linear regression model with graduation rate as our outcome variable and binary race (black or non-black), binary free and reduced lunch, binary size (small or large), and student-to-teacher ratio as our predictor variables.
The paper and R Appendix can be found and downloaded here:

Modeling the Relationship Between Alcohol Abuse in Young Adults and Their Level of Happiness at Home
With my peers Aiden Chang and Zack Dong, we analyzed the relationship between alcohol abuse in young adults and their level of happiness at home. To investigate this relationship, we constructed a logistic regression model with presence of drinks consumed in the past thirty days as our response variable and home happiness, presence of alcohol problems in the home, and time spent unsupervised by an adult as our explanatory variables. Multiple Wald’s tests and 95% confidence intervals yielded statistically discernible evidence in favor of an association between the odds of a young adult abusing alcohol and various home happiness factors.
The paper and R Appendix can be found and downloaded here:

Modeling the Relationship Between GDP Growth and Population Growth in Africa
With my peers Aiden Chang and Zack Dong, we analyzed the relationship between GDP growth and population growth in Africa. To investigate this relationship, we constructed a multiple linear regression model with GDP growth as our response variable, population growth as our primary explanatory variable of interest, and inflation, development assistance, internet, and barter as our control explanatory variables. A t-test and 95% confidence interval yielded non statistically discernible evidence in favor of a slight positive linear relationship between GDP growth and population growth.
The paper and R Appendix can be found and downloaded here:

Modeling the Relationship Between Paper Helicopter Wing Length and Drop Time
With my peers Ben Griesel and Elena Ea, we analyzed the relationship between paper helicopter wing length and drop time. We compared the mean drop time between four different groups of wing length n = 20 (5.5 inches, 6.5 inches, 7.5 inches, 8.5 inches) with an ANOVA F test. Our analysis of the data yielded strong evidence in favor of an 8.5-inch wing length being the optimum wing length to obtain the longest flight time.
The paper can be found and downloaded here:

COVID-19 Dashboard R Shiny Application
With my peer Ben Griesel, we created a shiny application that displays a COVID-19 Dashboard. By interacting with this application, a user can find information about various COVID-19 statistics across several countries. For example, the user can see a plot of total COVID-19 cases over time for a country of their choosing and compare it with multiple other countries of their choosing. Additionally, the user can view similar plots for total deaths by COVID-19 and proportions of people vaccinated.
The R Shiny application can be found at this link:

Oh How Times Change in a 100 Mile Ultramarathon (or don’t)
With my peers Helen Cross and Hazel DeHarpporte, we analyzed whether race times for ultra marathons were increasing or decreasing over time. We used data from the Leadville Trail 100 Run from the years 2012 to 2019. Our analysis included a one-way ANOVA test of the mean time for the race in each year. Additionally, we calculated confidence intervals for the difference between mean male times and mean female times.
The paper and R appendix can be found and downloaded here: