Welcome to the Repository!
This repository contains 8 projects, each focusing on a unique machine learning task. These projects explore data analysis, regression, classification, dimensionality reduction, clustering, and deep learning using diverse datasets. Below is a detailed description of each project and its objectives.
Build a regression model to predict attributes such as GDP per capita, social support, freedom, and other socioeconomic indicators.
- Data Analysis:
- Summarize the dataset (size, feature types).
- Visualize distributions and analyze correlations.
- Feature Engineering:
- Compute statistical values and identify features requiring special treatment.
- Model Training:
- Train and evaluate linear regression models using closed-form solutions and SGD.
- Explore Ridge, Lasso, and Elastic Net regularization.
- Compare performance with polynomial regression.
- Evaluation:
- Test the model using a separate test set and evaluate its performance.
- Jupyter Notebook for analysis and visualization.
- Libraries: NumPy, Pandas, Matplotlib, Scikit-learn.
Classify rock types (Igneous, Metamorphic, Sedimentary) using a dataset with 11 features.
- Data Preparation:
- Process and split the dataset into training, validation, and testing sets.
- Visualize feature distributions and analyze correlations.
- Model Training:
- Train classifiers such as Logistic Regression, SVM, and Random Forest.
- Tune hyperparameters using grid search or manual exploration.
- Ensemble Methods:
- Combine individual classifiers into an ensemble for improved performance.
- Comparison with Human Performance:
- Compare model accuracy with human accuracy and analyze the results.
- Jupyter Notebook for analysis and training.
- Libraries: Scikit-learn, Matplotlib, Seaborn.
Apply PCA and other dimensionality reduction techniques to analyze rock images and identify clusters.
- PCA Analysis:
- Determine the number of components required to retain 90% variance.
- Reconstruct and visualize images with reduced dimensionality.
- Visualization:
- Use PCA, t-SNE, LLE, and MDS to create 2D scatter plots of image embeddings.
- Comparison with Human Features:
- Use Procrustes analysis to compare embeddings with human-labeled features.
- Clustering:
- Perform clustering using K-Means and EM.
- Visualize and evaluate clustering performance.
- Neural Networks:
- Build a feedforward neural network (using dense and CNN layers).
- Procrustes analysis with human data.
- Image processing: OpenCV, Matplotlib.
- Dimensionality Reduction: Scikit-learn.
- Procrustes Analysis: SciPy.
Evaluate the performance of a pre-trained models on dataset.
- Model Description:
- Analyze the architecture and parameter breakdown of the selected CLIP model.
- Image Classification:
- Evaluate model accuracy across five conditions (realistic, geons, silhouettes, blurred, features).
- Compare results with human performance.
- Dimensionality Reduction and Visualization:
- Use t-SNE to visualize embeddings in 2D space.
- Add object images to the embedding visualization.
- Analysis:
- Compare model embeddings with human-labeled features using Procrustes analysis.
- Compute correlation coefficients and analyze significance.
- Pre-trained models: Huggingface Transformers.
- Visualization: Matplotlib, t-SNE.
- Libraries: NumPy, SciPy, Scikit-learn.