Dataset link: https://www.kaggle.com/hugodarwood/epirecipes?select=epi_r.csv
Size of Dataset:- 17736 rows and 680 columns
By:- Kuntal Gorai and Svsc Santosh
This step involves cleaning the data by : -
i)Removing outliers
ii)Replacing Null Values
iii)Removing Duplicate records
Also includes the use of PCA to reduce multicollinearity between attributes in a datamodel
Visualising the data with use of graphs
Reducing the number of columns in dataset having 680 columns to around 6 columns using different approaches
Keeping data ready to be used up by the model for training
By:- Venkata Krishna Arjun Vupalla and S Mahammad Aasheesh
This step Includes:-
1)Implementation of the three models:-
a)Multiple Linear Regression
b)Support Vector Machines
c)Decision Trees
2)Training and testing models with data.
3)applying model on test data
4)obtaining model metrics
5)comparing which model is the best