Welcome to the world of Nikhil's Data Adventures!
This github repository contains the Projects I worked on as part of My Graduate studies at SJSU in Data Analytics. The projects I worked on include:
-
Urban Audio Classification using UrbanSound8k, ESC-50 audio datasets, by utilizing Vision transformers, with Transfer Learning, and a comparitive analysis against the CNN variants (CNN, CNN + LSTM, 2D-CNN). Data Augmentation was also involved, utilized SpecAugment, and some other Noise inducing augmentation techniques to help the models generalize better on unseen data. Achieved SOTA scores wrt Classification performance metrics - Accuracy, precision, f1, AUC ROC, & recall - The brief report here https://shorturl.at/yFMP6
-
Predicting Side-effects from Drug-Drug Interactions using BioSNAP datasets, and Graph modeling. Modeled the drug-drug interactions to predict the possibility of a side-effect for a certain dr-dr reaction. This has high real time relevance as Polypharmacy is one of the key current areas of research. Utilized Neo4j to model the graphs, Utilized Jaccard distances to understand the distribution of nodes & edges in the network, Word2Vec for feature engineering, utilized standard ML algorithms like Decision tree, Logistic regression, but the performance metrics were too low. Utilized GCN to result in accuracy of 87%, along with significant improvement in other metrics - The brief report & code base here - https://shorturl.at/lwP45
-
Big Data Application to predict and the various employment related metrics like benefits, salary, median salary across department, and job family to help the job seekers understand the hiring trends of City of San Fransico across different departments. The tech stack utilized in this project was Apache Spark, Kafka, EMR, PySpark, Redshift, Tableau, React, and APis. The link to code and report here - https://shorturl.at/nswNT
-
Effects of BART Phase 2 on SJSU Commuters - This project was a part of the Db course. I modeled the GTFS datasets of BART and VTA architectures for Bay area. The used tech stack was Workbench, Tableau, SQL, BigQuery, DBT Cloud, and we also analyzed and modeled the BART routes as a network through Neo4j, and a React UI. The link to the project is here - https://shorturl.at/jmCV2 the visualizations I built on tableau are here - https://public.tableau.com/app/profile/nikhil3423/vizzes
-
Exploring the Marvel universe with Extensive Data Analysis and Visualization, this project was done as part of my Data Visualization course, so the focus is completely on Data Visualizations, and Data Pre-processing, along with a lot of SQL. The tech stack used is API, TABLEAU, WORKSHEETS, DBT, SQL, PYTHON. The description is available on this blog post here - https://rb.gy/qh6bj6
I have a medium account as well, where I try to share short snippets of the work I have done, here it is - https://medium.com/@nikhil.thota_81762