📊 Codveda Technologies — Data Analytics Internship

Intern: Omokhoa Oshose Tosayoname | ID: CV/A1/61250 | Role: Data Analysis Intern

Organisation: Codveda Technologies | Duration: March – May 2026 | Mode: Remote

📋 Table of Contents

Project Overview
Repository Structure
Tools and Technologies
Level 1 — Basic Data Analysis
Level 2 — Intermediate Data Analysis
Level 3 — Advanced Data Analysis
Key Findings Summary
Dashboards
How to Run the Notebooks
Connect With Me

🎯 Project Overview

This repository contains all completed work for the Codveda Technologies Data Analytics Internship — a hands-on, project-based programme designed to develop real-world data analytics skills across three progressive levels.

Over the course of the internship, six data analytics tasks were completed spanning the full analytics pipeline:

Level	Focus	Tasks Completed
Level 1 — Basic	Data exploration and visualisation	EDA · Basic Data Visualisation
Level 2 — Intermediate	Pattern discovery and time-based analysis	Time Series Analysis · K-Means Clustering
Level 3 — Advanced	NLP and business intelligence	Sentiment Analysis · Interactive Dashboards

All tasks are documented in fully executable Jupyter Notebooks with step-by-step explanations, inline visualisations, and analytical insights. Dashboards were built in both Power BI and Tableau Public.

📁 Repository Structure

codveda-data-analytics-internship/
│
├── README.md
├── requirements.txt
│
├── Level_1_Basic/
│   ├── Level_1_Basic_Data_Analysis.ipynb
│   └── outputs/
│       ├── 01_histograms.png
│       ├── 02_boxplots.png
│       ├── 03_scatter_plots.png
│       ├── 04_correlation_heatmap.png
│       ├── 05_bar_plot.png
│       └── 06_line_charts.png
│
├── Level_2_Intermediate/
│   ├── Level_2_Intermediate_Data_Analysis.ipynb
│   └── outputs/
│       ├── 07_aapl_raw_timeseries.png
│       ├── 08_moving_averages.png
│       ├── 09_decomposition.png
│       ├── 10_multi_stock_comparison.png
│       ├── 11_elbow_method.png
│       └── 12_kmeans_clusters.png
│
├── Level_3_Advanced/
│   ├── Level_3_Advanced_Data_Analysis.ipynb
│   └── outputs/
│       ├── 13_sentiment_distribution.png
│       ├── 14_polarity_subjectivity.png
│       ├── 15_sentiment_by_platform_country.png
│       ├── 16_sentiment_over_time.png
│       ├── 17_word_clouds.png
│       ├── 18_top_words.png
│       ├── 19_polarity_boxplot.png
│       ├── powerbi_dashboard_screenshot.png
│       └── tableau_dashboard_screenshot.png
│
└── datasets/
    ├── iris.csv
    ├── Stock_Prices_Data_Set.csv
    ├── Sentiment_dataset.csv
    ├── Churn_Dashboard_Data.csv
    ├── churn-bigml-80.csv
    └── churn-bigml-20.csv

🛠 Tools and Technologies

Category	Tools
Language	Python 3.10+
Data Manipulation	pandas, numpy
Visualisation	matplotlib, seaborn, WordCloud
Machine Learning	scikit-learn (KMeans, StandardScaler, PCA)
Time Series	statsmodels
NLP	TextBlob, re (regex)
Business Intelligence	Microsoft Power BI Desktop, Tableau Public
Development Environment	Jupyter Notebook, Google Colab
Version Control	Git, GitHub

📗 Level 1 — Basic Data Analysis

Dataset: Iris Dataset (150 records, 5 features) Notebook: Level_1_Basic_Data_Analysis.ipynb

Task 2 — Exploratory Data Analysis (EDA)

Performed a comprehensive exploratory analysis of the Iris dataset to understand its structure, distribution, and feature relationships.

Steps taken:

Removed 3 duplicate rows identified during initial inspection
Calculated mean, median, mode, and standard deviation for all four numerical features
Computed a full correlation matrix and interpreted pairwise relationships
Generated per-species summary statistics revealing how the three flower species differ

Key findings:

Finding	Detail
Most variable feature	Petal length (std = 1.76 cm)
Most consistent feature	Sepal width (std = 0.44 cm)
Strongest correlation	Petal length vs Petal width (r = 0.96)
Most separable species	Iris setosa — distinctly small petals

Task 3 — Basic Data Visualisation

Created six publication-quality charts to communicate the dataset's structure and relationships:

Chart	Insight
Histograms	Setosa petal measurements cluster far below other species
Boxplots	Virginica has widest spread; setosa most consistent
Scatter plots	Petal dimensions form near-perfect linear clusters by species
Correlation heatmap	Petal features almost perfectly correlated (r = 0.96)
Bar plot	Virginica largest across all features
Line charts	Petal bands clearly species-separated across all samples

📘 Level 2 — Intermediate Data Analysis

Datasets: Stock Prices Dataset (497,472 rows, 505 symbols) · Iris Dataset Notebook: Level_2_Intermediate_Data_Analysis.ipynb

Task 2 — Time Series Analysis

Analysed Apple Inc. (AAPL) daily closing prices from January 2014 to December 2017.

Steps taken:

Filtered AAPL from 505 stock symbols
Converted date strings to datetime and set as index
Computed 30-day and 90-day moving averages
Applied multiplicative seasonal decomposition (period = 252 trading days)
Normalised and compared AAPL against MSFT, GOOGL, and AMZN

Key findings:

Finding	Detail
Total price return	+114.2% ($79.02 → $169.23)
Price minimum	$71.40 (mid-2016)
Price maximum	$176.42 (late 2017)
Best performing peer	Amazon (AMZN) — ~4× return over same period
Decomposition model	Multiplicative (annual seasonal cycle detected)

Task 3 — Clustering Analysis (K-Means)

Applied unsupervised machine learning to the Iris dataset to discover natural groupings without using species labels.

Steps taken:

Standardised all four features using StandardScaler (z = (x − μ) / σ)
Applied the Elbow Method across K = 1 to 10 — optimal K = 3 confirmed
Fitted K-Means with K=3 and validated against true species labels
Visualised clusters using PCA projection (97.7% variance explained)

Key findings:

Finding	Detail
Optimal clusters	K = 3 (confirmed by elbow method)
Setosa accuracy	100% — all 48 records correctly isolated
Overall accuracy	~85% — overlap between versicolor/virginica
PCA variance explained	97.7% across two components

📙 Level 3 — Advanced Data Analysis

Datasets: Social Media Sentiment Dataset (732 posts) · Customer Churn Dataset (3,333 customers) Notebook: Level_3_Advanced_Data_Analysis.ipynb

Task 3 — NLP Sentiment Analysis

Built a complete text analytics pipeline to classify social media posts as Positive, Negative, or Neutral.

Pipeline stages:

Data cleaning — stripped whitespace, dropped redundant columns
Emotion label mapping — consolidated 279 unique labels into 3 classes
Text preprocessing — lowercase, URL/mention removal, tokenisation, stopword removal, lemmatisation
TextBlob sentiment scoring — polarity and subjectivity computed for all 732 posts
Visualisation — 7 charts including word clouds, frequency analysis, and platform breakdowns

Key findings:

Finding	Detail
Sentiment distribution	64.6% Positive · 28.3% Negative · 7.1% Neutral
TextBlob agreement rate	46.2% — expected given nuanced emotion labels
Mean polarity — Positive	+0.191
Mean polarity — Negative	−0.106
Top positive words	beautiful, enjoy, feel, amazing, love
Top negative words	fearful, shadows, storm, heartbreak, lost

Task 2 — Interactive Dashboard (Power BI + Tableau)

Built a Customer Churn Analysis Dashboard in both Power BI and Tableau Public using a combined churn dataset of 3,333 customer records with 7 engineered features.

Dashboard visuals:

KPI Cards (Total Customers · Churn Rate · Avg Charge · Avg Service Calls)
Churn by International Plan (100% Stacked Bar)
Churn by Service Call Risk (Stacked Column — staircase pattern)
Total Charge Distribution (Histogram)
Churn Rate by State (Filled Map)
Churned Customers by Tenure Band (Pie/Donut Chart)
Day Usage vs Charge (Scatter Plot)
Interactive Filters (International Plan · Voice Mail Plan)

Key findings:

Finding	Detail
Overall churn rate	14.5% (483 of 3,333 customers)
Highest churn driver	International plan (~4× higher churn rate)
Critical risk threshold	5+ customer service calls
Highest-risk tenure segment	Established (100-149 days) — 41.4% of churned

💡 Key Findings Summary

Across all six tasks, three cross-cutting analytical themes emerged:

1. Feature importance is not obvious from inspection alone In the Iris dataset, petal dimensions proved far more discriminating than sepal dimensions — something that only became apparent through EDA and confirmed by K-Means clustering. Similarly in the churn dataset, the international plan variable was not the most prominent field but turned out to be the strongest churn predictor.

2. Real-world data is always messy The sentiment dataset arrived with 279 unique emotion labels, whitespace inconsistencies, and redundant index columns. The churn dataset required feature engineering before it could support meaningful dashboard analysis. Professional data work begins with cleaning, not analysis.

3. Communication is as important as computation The same churn insights that exist as numbers in a Python script become genuinely actionable when presented in an interactive Power BI or Tableau dashboard that a business stakeholder can explore without technical knowledge.

📊 Dashboards

Tableau Public Dashboard (Live & Interactive)

🔗 Customer Churn Analysis Dashboard

Click the link above to explore the fully interactive dashboard — filter by International Plan, Voice Mail Plan, and click any chart to cross-filter the entire dashboard.

Power BI Dashboard

Built in Microsoft Power BI Desktop. Screenshot available in Level_3_Advanced/outputs/powerbi_dashboard_screenshot.png

▶ How to Run the Notebooks

Option 1 — Google Colab (Recommended, no setup required)

Open the notebook directly in Google Colab
Upload the required datasets to a datasets/ folder in the Colab file system
Run all cells top to bottom — all dependencies are installed automatically

Option 2 — Local Jupyter Notebook

Prerequisites: Python 3.10+, pip

Step 1 — Clone the repository:

git clone https://github.com/Tosa9/codveda-data-analytics-internship.git
cd codveda-data-analytics-internship

Step 2 — Install dependencies:

pip install -r requirements.txt

Step 3 — Launch Jupyter:

jupyter notebook

Step 4 — Open any notebook from the Level folders and run all cells.

Dataset Note

Datasets are included in the datasets/ folder. The Stock Prices dataset (~24MB) may take a moment to load.

📦 Requirements

pandas>=2.0
numpy>=1.24
matplotlib>=3.7
seaborn>=0.12
scikit-learn>=1.3
statsmodels>=0.14
textblob>=0.17
wordcloud>=1.9
jupyter>=1.0
nbformat>=5.9

Install all dependencies with:

pip install -r requirements.txt

🤝 Connect With Me

Omokhoa Oshose Tosayoname Data Science/Analysis Intern | Mechanical Engineering Student | Junior Project Manager

Completed as part of the Codveda Technologies Data Analytics Internship Programme Intern ID: CV/A1/61250 | March – May 2026

#Codveda #CodvedaTech #CodvedaInternship #DataAnalytics #Python #MachineLearning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📊 Codveda Technologies — Data Analytics Internship

📋 Table of Contents

🎯 Project Overview

📁 Repository Structure

🛠 Tools and Technologies

📗 Level 1 — Basic Data Analysis

Task 2 — Exploratory Data Analysis (EDA)

Task 3 — Basic Data Visualisation

📘 Level 2 — Intermediate Data Analysis

Task 2 — Time Series Analysis

Task 3 — Clustering Analysis (K-Means)

📙 Level 3 — Advanced Data Analysis

Task 3 — NLP Sentiment Analysis

Task 2 — Interactive Dashboard (Power BI + Tableau)

💡 Key Findings Summary

📊 Dashboards

Tableau Public Dashboard (Live & Interactive)

Power BI Dashboard

▶ How to Run the Notebooks

Option 1 — Google Colab (Recommended, no setup required)

Option 2 — Local Jupyter Notebook

Dataset Note

📦 Requirements

🤝 Connect With Me

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Level_1_Basic/outputs		Level_1_Basic/outputs
Level_2_Intermediate		Level_2_Intermediate
Level_3_Advanced/outputs		Level_3_Advanced/outputs
datasets		datasets
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

📊 Codveda Technologies — Data Analytics Internship

📋 Table of Contents

🎯 Project Overview

📁 Repository Structure

🛠 Tools and Technologies

📗 Level 1 — Basic Data Analysis

Task 2 — Exploratory Data Analysis (EDA)

Task 3 — Basic Data Visualisation

📘 Level 2 — Intermediate Data Analysis

Task 2 — Time Series Analysis

Task 3 — Clustering Analysis (K-Means)

📙 Level 3 — Advanced Data Analysis

Task 3 — NLP Sentiment Analysis

Task 2 — Interactive Dashboard (Power BI + Tableau)

💡 Key Findings Summary

📊 Dashboards

Tableau Public Dashboard (Live & Interactive)

Power BI Dashboard

▶ How to Run the Notebooks

Option 1 — Google Colab (Recommended, no setup required)

Option 2 — Local Jupyter Notebook

Dataset Note

📦 Requirements

🤝 Connect With Me

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages