Releases: RaedAddala/Scraping-IMDB
Added New Output Formats
Updated the notebook and improved the data cleaning and normalization.
Export to:
- Feather
- Parquet
- CSV
Cleaned Data from 1920 to 2025
The differences between this one and the previous one are:
- Naming Convention across all columns.
- Fixed Date.
- Treating all empty, Null~ish values as pdna
Cleaned Data From 1920 to 2025
Over 60,000 Movies, 100+ Years of Data, and Rich Metadata!
About the Dataset
The final_data.csv file is a consolidated dataset combining data for the most popular 500–600 movies per year from 1920 to 2025, extracted from IMDb. This dataset aggregates all the yearly merged_movies_data_[year].csv files into a comprehensive CSV file for streamlined analysis.
Links:
- Cleaned Data on Kaggle.
- Step-by-step Kaggle Notebook that merges and cleans the data.
- Original Data on Kaggle.
- Analysis Notebooks can be found Here on Kaggle.
What's Changed
- Fix previous problems and add more years by @RaedAddala in #2
New Contributors
- @RaedAddala made their first contribution in #2
Full Changelog: cleaned_data.1960-2024...cleaned_data_from_1920_to_2025
Data From 1960 to 2024
This CSV file contains all the extracted data from 1960 to 2024.
For more information on its structure check the README file.
Also, check the Kaggle link: https://www.kaggle.com/datasets/raedaddala/imdb-movies-from-1960-to-2023
Cleaned Data From 1960 to 2024
Over 30,000 Movies, 60+ Years of Data, and Rich Metadata!
About the Dataset
The final_data.csv file is a consolidated dataset combining data for the most popular 500–600 movies per year from 1960 to 2024, extracted from IMDb. This dataset aggregates all the yearly merged_movies_data_[year].csv files into a single, comprehensive CSV file for streamlined analysis.
Links:
- Cleaned Data on Kaggle.
- Step-by-step Kaggle Notebook that merges and cleans the data.
- Original Data on Kaggle.
- Analysis Notebooks can be found Here on Kaggle.