- Lisa Mechaly Bensoussan
- Jeremy Hakoune
This project consists of SQL queries and Unix shell commands to analyze large datasets from Stack Overflow using Google BigQuery and Unix-based operations. The tasks focus on extracting, cleaning, and analyzing data related to JavaScript and Python questions on Stack Overflow. The goal is to explore patterns, trends, and user behavior related to these programming languages through querying and data wrangling.
-
SQL Queries using BigQuery:
- Querying Stack Overflow data to identify the most popular JavaScript-related posts.
- Statistical analysis on post activity across different days of the week.
- Cross-analysis between JavaScript and Python-related questions.
-
Unix Shell Commands:
- Working with large text files from Stack Overflow.
- Counting words and handling large-scale text data using shell commands and Python scripts.
- Splitting files based on years and performing word frequency analysis.
For each task, appropriate SQL queries, shell commands, and Python scripts are provided, along with outputs to demonstrate the correctness of the solutions.
- All shell commands and SQL queries were run using the provided dataset.
- The data was cleaned and processed to handle commas and special characters for accurate results.