Skip to content

Files

Latest commit

 

History

History
45 lines (33 loc) · 4.7 KB

syllabus.md

File metadata and controls

45 lines (33 loc) · 4.7 KB

APSSDC-LOGO

Data Analysis Using Python

Course Content

Topic No Topic Name Sub Topics
1 Introduction to Data Science Introduction to Data Science
What is Data Science
Types of Data in Statistics (Numerical & Categorical)
Overview of Python Concepts
What is Machine Learning
Machine Learning Classification
Types of Algorithms
2 Data Manipulation with NumPy Introduction
NumPy Arrays
NumPy Basics
Math
Indexing
Random
Filtering
Statistics
Aggregation
Saving Data
3 Data Analysis with Pandas Introduction to Data Analysis using Pandas
Pandas Series
Pandas DataFrame
Combining
Indexing
File I/O
Grouping
Features
Filtering
Sorting Stastical Plotting
4 Data Cleaning with Pandas and
Data Preprocessing with Scikit Learn
Introduction to Data Preprocessing and Scikit-Learn
Standardizing of Data
Robust Scaling
Data Range
Normalizing Data
Label Encoder and One Hot Encoding
Polynomial Features
Working with Duplicates and Missing Values
Which values should be replace with missing values based on type of data
Identifying and Eliminating of Outliers
Filling missing data using Data Imputation
5 Introduction to Data Visualization with Matplotlib Introduction to Visualization and Python packages
Matplotlib history and Architecture
Introduction to plotting
Line Plot
Scatter Plot
Bar Graph
Histogram
Pie Chart Box Plot
6 Data Visualization With Seaborn Using Seaborn Styles
Setting the default style
Color Palettes
Creating Custom Palettes
stripplot() and swarmplot()
boxplots, violinplots
barplots, pointplots and countplots
Regression Plots
Binning data
Creating heatmaps
Applying on raw dataset and introduction to Kaggle and other data sources
7 Regression Models Linear Regression with One variable
Evaluation Metrics in Regression Models
Train/Test splitting of data & Cross Validation
Linear Regression with Multiple Variables
Polynomial Features
Non-Linear Regression with One variable
Non-Linear Regression with Multiple variable
8 Regularization Models Under fitting
Overfitting
Best fit
Applying Ridge Regression
Lasso Regression Algorithms
9 Classification models - 1 Introduction to categorical types of data
Types of classification
K-Nearest Neighbors Classifier
Evaluation Metrics for classification Models
Logistic regression
Support Vector Machines
10 Classification Models - 2 Introduction to Decision Tree
Terminology related to Decision Trees
Types of Decision Trees
Decision Trees Classifier
Decision Tree Regressor
Random Forest Algorithm
11 Unsupervised Machine Learning Introduction to Unsupervised Learning
Types of Unsupervised Learning
12 Clustering Introduction to clustering
Types of Clustering Methods
KMeans Clustering
Applications
13 Dimensionality Reduction: Dimensionality Reduction:
Principal Component Analysis (PCA)

Hardware Requirements:

  • i3 or above Processor is required
  • 4 GB or above RAM is recommended
  • Good Internet Connectivity
  • OS-Windows 10 is Preferable

Duration :

60 Hours (6 hours each day X 10 days)

Course Objectives:

  • To introduce students/Faculty to the basic concepts and techniques of Data Science and Machine Learning.
  • To develop skills of using recent machine learning software for solving practical problems.
  • To gain experience in doing independent study and research.

Entry Requirements:

  • Students must have Knowledge of Python Programming.
  • Statistics and Algebra, Maths.