Hypothesis Testing of Movie Ratings Data

Bess Yang (qy561@nyu.edu), Iris Lu (hl5679@nyu.edu), Leo Zhao (yz9820@nyu.edu)

Project Overview

This project focused on testing several hypotheses regarding movie ratings data from 1097 participants across 400 movies. The dataset included various behavioral and demographic attributes. We applied a variety of statistical techniques, including independent t-tests, ANOVA, and non-parametric tests, to investigate factors such as gender differences in movie enjoyment, sibling influences, and the consistency of quality in movie franchises.

GitHub

My specific contributions are in the file ids_project_1_bess_yang_Q5610.ipynb

Report

Data Viz

Languages, Platforms, and Tools

Languages: Python
Platforms: Jupyter Notebooks, GitHub
Libraries and Tools:
- Pandas, NumPy (Data Processing)
- Scikit-learn (Statistical Modeling)
- Matplotlib, Seaborn (Data Visualization)
- SciPy (Statistical Testing)

My Contributions

Question 5: I investigated whether people who are only children enjoy The Lion King (1994) more than those with siblings. Using an independent samples t-test, I found no significant difference between the two groups in terms of movie enjoyment.
Question 6: I explored the proportion of movies that exhibit an "only child effect," meaning they are rated differently by viewers with or without siblings. My analysis revealed that only 0.5% of the movies in the dataset demonstrated such an effect.
Question 10: I performed ANOVA and Kruskal-Wallis tests to examine whether the quality of movies in popular franchises (Star Wars, Harry Potter, etc.) was consistent across the films. The results showed that seven out of the eight franchises had significant variation in quality as perceived by viewers.

AD_4nXf1jn285e7wufY7sYDL_ZPteFWRHnEoODFC-qfD1xfKdbk7KqdoRCVCq2-86JXmDznvabROtb6-zxuIzigD3h