November 30, 2025
For our final project, we are asked to use the statistical methods taught in this course to come up with some objective findings from a dataset. The dataset I have chosen is a student exam score analysis. It consists of 200 variables and 6 columns as student id, hours studied, sleep hours, attendance percentage, previous scores, and exam score. I will first focus on exam scores and hours slept to determine if sleep influences students test scores.
Hypothesis (two-sided test):
H0 (null): Sleep hours do not affect exam scores.
H1 (alternative): Sleep hours have a significant effect on exam scores.
This study looks at how the amount of sleep a student gets affects their exam performance. We want to see whether getting more sleep is linked to higher or lower exam scores. In the past, we’ve used methods like ANOVA and regression to see how a single factor can influence an outcome, and here we’re using a similar approach to explore the relationship between sleep hours and exam results.
I ran an ANOVA to see whether the number of sleep hours affects students’ exam scores. The results were statistically significant (p = 0.0076), which means sleep hours do have an effect on exam performance in our sample. Since p < 0.01, we reject the null hypothesis (H0). However, when we look at the scatter plot with the regression line, the relationship doesn’t appear very strong visually — the points are pretty spread out and the trend line is fairly flat. This suggests that, while the effect is statistically significant, sleep hours alone may not be a strong predictor of exam scores, and other factors like study hours or attendance might play a bigger role.
This naturally leads to the question: which of these variables—hours studied, sleep hours, attendance percentage, or previous scores—has the strongest influence on exam performance?
To figure this out, I ran a multiple regression using standardized variables to see which factors most strongly influence exam scores.
Hours studied (0.739) has the largest coefficient, meaning the strongest influence. Next is previous scores (0.409), followed by attendance (0.228), then sleep hours (0.210). All variables are statistically significant (p < 0.05), so each contributes to exam scores, but hours studied dominates.
The results show that hours studied has the strongest impact on performance, followed by previous scores, attendance percentage, and sleep hours. Although all predictors are statistically significant, the visualizations and standardized coefficients indicate that study habits and prior performance are far more influential than sleep or attendance. This suggests that while sleep and attendance matter, focusing on study hours and building on prior knowledge has the biggest effect on exam outcomes.
We can visualize this by using ggplot as shown below:
This study looked at what affects students’ exam scores, focusing on sleep hours, hours studied, attendance, and previous scores. We first checked whether sleep mattered and found that it does have a statistically significant effect, so we rejected the idea that it has no impact. However, when comparing all the factors together, sleep turned out to have the smallest influence. On the other hand, hours studied had the biggest effect on exam performance, followed by previous scores and attendance. This shows that while getting enough sleep and attending class helps, putting time into studying and building on what you already know makes the biggest difference in exam results. Overall, the study highlights which student habits matter most for doing well on exams.