Hypothesis Testing of Movie Ratings Data
Bess Yang (qy561@nyu.edu), Iris Lu (hl5679@nyu.edu), Leo Zhao (yz9820@nyu.edu)
Project Overview
This project focused on testing several hypotheses regarding movie ratings data from 1097 participants across 400 movies. The dataset included various behavioral and demographic attributes. We applied a variety of statistical techniques, including independent t-tests, ANOVA, and non-parametric tests, to investigate factors such as gender differences in movie enjoyment, sibling influences, and the consistency of quality in movie franchises.
Data Viz
Languages, Platforms, and Tools
-
Languages: Python
-
Platforms: Jupyter Notebooks, GitHub
-
Libraries and Tools:
-
Pandas, NumPy (Data Processing)
-
Scikit-learn (Statistical Modeling)
-
Matplotlib, Seaborn (Data Visualization)
-
SciPy (Statistical Testing)
-
My Contributions
-
Question 5: I investigated whether people who are only children enjoy The Lion King (1994) more than those with siblings. Using an independent samples t-test, I found no significant difference between the two groups in terms of movie enjoyment.
-
Question 6: I explored the proportion of movies that exhibit an "only child effect," meaning they are rated differently by viewers with or without siblings. My analysis revealed that only 0.5% of the movies in the dataset demonstrated such an effect.
-
Question 10: I performed ANOVA and Kruskal-Wallis tests to examine whether the quality of movies in popular franchises (Star Wars, Harry Potter, etc.) was consistent across the films. The results showed that seven out of the eight franchises had significant variation in quality as perceived by viewers.

