Skip to content

This project focuses on analyzing Stack Overflow data related to JavaScript and Python questions using a combination of SQL queries (Google BigQuery) and Unix shell commands. The aim is to explore trends, activity patterns, and user behavior around these popular programming languages through data wrangling and querying techniques.

Notifications You must be signed in to change notification settings

lisabensoussan/BigData_Midterm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Big Data Mining

MidTerm 52019/52002 2023-24

Students :

  • Lisa Mechaly Bensoussan
  • Jeremy Hakoune

Project Overview:

This project consists of SQL queries and Unix shell commands to analyze large datasets from Stack Overflow using Google BigQuery and Unix-based operations. The tasks focus on extracting, cleaning, and analyzing data related to JavaScript and Python questions on Stack Overflow. The goal is to explore patterns, trends, and user behavior related to these programming languages through querying and data wrangling.

Tasks:

  1. SQL Queries using BigQuery:

    • Querying Stack Overflow data to identify the most popular JavaScript-related posts.
    • Statistical analysis on post activity across different days of the week.
    • Cross-analysis between JavaScript and Python-related questions.
  2. Unix Shell Commands:

    • Working with large text files from Stack Overflow.
    • Counting words and handling large-scale text data using shell commands and Python scripts.
    • Splitting files based on years and performing word frequency analysis.

Proof of Work:

For each task, appropriate SQL queries, shell commands, and Python scripts are provided, along with outputs to demonstrate the correctness of the solutions.

Remarks:

  • All shell commands and SQL queries were run using the provided dataset.
  • The data was cleaned and processed to handle commas and special characters for accurate results.

About

This project focuses on analyzing Stack Overflow data related to JavaScript and Python questions using a combination of SQL queries (Google BigQuery) and Unix shell commands. The aim is to explore trends, activity patterns, and user behavior around these popular programming languages through data wrangling and querying techniques.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published