Skip to content

Commit

Permalink
Merge pull request #7 from josemoracard/jose1-README
Browse files Browse the repository at this point in the history
fixed text README.md, learn.json, solution.ipynb
  • Loading branch information
alesanchezr authored May 7, 2024
2 parents 5e9e853 + 192afbc commit d80c958
Show file tree
Hide file tree
Showing 4 changed files with 31 additions and 19 deletions.
18 changes: 12 additions & 6 deletions README.es.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,15 @@

- Comprender un dataset nuevo.
- Procesarlo aplicando un análisis exploratorio (EDA).
- Modelar los datos utilizando la regresión lineal regularizada.
- Modelar los datos construyendo un árbol de decisión.
- Analizar los resultados y optimizar el modelo.

## 🌱 Cómo iniciar este proyecto
## 🌱 Cómo iniciar este proyecto

Sigue las siguientes instrucciones:

1. Crea un nuevo repositorio basado en el [proyecto de Machine Learing](https://github.com/4GeeksAcademy/machine-learning-python-template/generate) [haciendo clic aquí](https://github.com/4GeeksAcademy/machine-learning-python-template).
2. Abre el repositorio creado recientemente en Codespace usando la [extensión del botón de Codespace](https://docs.github.com/en/codespaces/developing-in-codespaces/creating-a-codespace-for-a-repository#creating-a-codespace-for-a-repository).
1. Crea un nuevo repositorio basado en el [proyecto de Machine Learning](https://github.com/4GeeksAcademy/machine-learning-python-template) [haciendo clic aquí](https://github.com/4GeeksAcademy/machine-learning-python-template/generate).
2. Abre el repositorio creado recientemente en Codespace usando la [extensión del botón de Codespace](https://docs.github.com/es/codespaces/developing-in-codespaces/creating-a-codespace-for-a-repository#creating-a-codespace-for-a-repository).
3. Una vez que el VSCode del Codespace haya terminado de abrirse, comienza tu proyecto siguiendo las instrucciones a continuación.

## 🚛 Cómo entregar este proyecto
Expand All @@ -27,7 +27,13 @@ Este conjunto de datos proviene originalmente del Instituto Nacional de Diabetes

#### Paso 1: Carga del conjunto de datos

El conjunto de datos se puede encontrar en esta carpeta de proyecto bajo el nombre `diabetes.csv`. Puedes cargarlo en el código directamente desde el enlace (`https://raw.githubusercontent.com/4GeeksAcademy/decision-tree-project-tutorial/main/diabetes.csv`) o descargarlo y añadirlo a mano en tu repositorio. En este conjunto de datos encontrarás las siguientes variables:
El conjunto de datos se puede encontrar en esta carpeta de proyecto bajo el nombre `diabetes.csv`. Puedes cargarlo en el código directamente desde el siguiente enlace:

```text
https://raw.githubusercontent.com/4GeeksAcademy/decision-tree-project-tutorial/main/diabetes.csv
```

O descargarlo y añadirlo a mano en tu repositorio. En este conjunto de datos encontrarás las siguientes variables:

- `Pregnancies`. Número de embarazos del paciente (numérico)
- `Glucose`. Concentración de glucosa en plasma a las 2 horas de un test de tolerancia oral a la glucosa (numérico)
Expand Down Expand Up @@ -57,4 +63,4 @@ Después de entrenar el árbol con las distintas funciones de pureza, selecciona

Almacena el modelo en la carpeta correspondiente.

> NOTA: Solución: https://github.com/4GeeksAcademy/decision-tree-project-tutorial/blob/main/solution.ipynb
> Nota: También incorporamos muestras de solución en `./solution.ipynb` que te sugerimos honestamente que solo uses si estás atascado por más de 30 minutos o si ya has terminado y quieres compararlo con tu enfoque.
28 changes: 17 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,51 +1,57 @@
<!-- hide -->
# Decision trees - Steep by steep guide
# Decision trees - Step by step guide
<!-- endhide -->

- Understand a new dataset.
- Process it by applying exploratory data analysis (EDA).
- Model the data using logistic regression.
- Model the data building a decision tree.
- Analyze the results and optimize the model.

## 🌱 How to start this project
## 🌱 How to start this project

Follow the instructions below:

1. Create a new repository based on [machine learning project](https://github.com/4GeeksAcademy/machine-learning-python-template/generate) by [clicking here](https://github.com/4GeeksAcademy/machine-learning-python-template).
1. Create a new repository based on [machine learning project](https://github.com/4GeeksAcademy/machine-learning-python-template) by [clicking here](https://github.com/4GeeksAcademy/machine-learning-python-template/generate).
2. Open the newly created repository in Codespace using the [Codespace button extension](https://docs.github.com/en/codespaces/developing-in-codespaces/creating-a-codespace-for-a-repository#creating-a-codespace-for-a-repository).
3. Once the Codespace VSCode has finished opening, start your project by following the instructions below.

## 🚛 How to deliver this project

Once you have finished solving the exercises, be sure to commit your changes, push to your repository and go to 4Geeks.com to upload the repository link.
Once you have finished solving the exercises, be sure to commit your changes, push them to your repository, and go to 4Geeks.com to upload the repository link.

## 📝 Instructions

### Predicting Diabetes

This dataset originally comes from the National Institute of Diabetes and Digestive and Kidney Diseases. The goal is to predict based on diagnostic measures whether or not a patient has diabetes.
This dataset originally comes from the National Institute of Diabetes and Digestive and Kidney Diseases. The goal is to predict, based on diagnostic measures, whether or not a patient has diabetes.

#### Step 1: Loading the dataset

The dataset can be found in this project folder under the name `diabetes.csv`. You can load it into the code directly from the link (`https://raw.githubusercontent.com/4GeeksAcademy/decision-tree-project-tutorial/main/diabetes.csv`) or download it and add it by hand in your repository. In this dataset you will find the following variables:
The dataset can be found in this project folder under the name `diabetes.csv`. You can load it into the code directly from the link:

```text
https://raw.githubusercontent.com/4GeeksAcademy/decision-tree-project-tutorial/main/diabetes.csv
```

Or download it and add it by hand in your repository. In this dataset, you will find the following variables:

- `Pregnancies`. Number of pregnancies of the patient (numeric)
- `Glucose`. Plasma glucose concentration 2 hours after an oral glucose tolerance test (numeric)
- `BloodPressure`. Diastolic blood pressure (measured in mm Hg) (numeric)
- `SkinThickness`. Triceps skinfold thickness (measured in mm) (numeric)
- `SkinThickness`. Triceps skin fold thickness (measured in mm) (numeric)
- `Insulin`. 2-hour serum insulin (measured in mu U/ml) (numeric)
- `BMI`. Body mass index (numeric)
- `DiabetesPedigreeFunction`. Diabetes Pedigree Function (numeric)
- `Age`. Age of patient (numeric)
- `Outcome`. Class variable (0 or 1), being 0 negative in diabetes and 1, positive (numeric)
- `Outcome`. Class variable (0 or 1), being 0 negative in diabetes and 1 positive (numeric)

#### Step 2: Perform a full EDA

This second step is vital to ensure that we keep the variables that are strictly necessary and eliminate those that are not relevant or do not provide information. Use the example Notebook we worked on and adapt it to this use case.

Be sure to conveniently divide the data set into `train` and `test` as we have seen in previous lessons.

#### Step 3: Build a regression model
#### Step 3: Build a decision tree

Start solving the problem by implementing a decision tree and analyze which of the two types satisfies your needs. Train it and analyze its results. Try modifying the function for calculating the purity of the nodes and use all the available ones. Describe them and analyze your results by graphing them.

Expand All @@ -57,4 +63,4 @@ After training the tree with the different purity functions, it selects the best

Store the model in the corresponding folder.

> NOTE: Solution: https://github.com/4GeeksAcademy/decision-tree-project-tutorial/blob/main/solution.ipynb
> Note: We also incorporated the solution samples on `./solution.ipynb` that we strongly suggest you only use if you are stuck for more than 30 min or if you have already finished and want to compare it with your approach.
2 changes: 1 addition & 1 deletion learn.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,5 @@
"syntax": "python",
"duration" : 2,
"projectType": "project",
"description" : "Use decision tree algorithm to diagnose diabetes by using patiente medical information from previous medical exams"
"description" : "Use decision tree algorithm to diagnose diabetes by using patients medical information from previous medical exams"
}
2 changes: 1 addition & 1 deletion solution.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -386,7 +386,7 @@
}
],
"source": [
"# There is not required a normalization process for the variables to train this model\n",
"# No normalization of variables is needed for training this model\n",
"\n",
"# Feature selection\n",
"\n",
Expand Down

0 comments on commit d80c958

Please sign in to comment.