You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.es.md
+12-6Lines changed: 12 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -4,15 +4,15 @@
4
4
5
5
- Comprender un dataset nuevo.
6
6
- Procesarlo aplicando un análisis exploratorio (EDA).
7
-
- Modelar los datos utilizando la regresión lineal regularizada.
7
+
- Modelar los datos construyendo un árbol de decisión.
8
8
- Analizar los resultados y optimizar el modelo.
9
9
10
-
## 🌱 Cómo iniciar este proyecto
10
+
## 🌱 Cómo iniciar este proyecto
11
11
12
12
Sigue las siguientes instrucciones:
13
13
14
-
1. Crea un nuevo repositorio basado en el [proyecto de Machine Learing](https://github.com/4GeeksAcademy/machine-learning-python-template/generate)[haciendo clic aquí](https://github.com/4GeeksAcademy/machine-learning-python-template).
15
-
2. Abre el repositorio creado recientemente en Codespace usando la [extensión del botón de Codespace](https://docs.github.com/en/codespaces/developing-in-codespaces/creating-a-codespace-for-a-repository#creating-a-codespace-for-a-repository).
14
+
1. Crea un nuevo repositorio basado en el [proyecto de Machine Learning](https://github.com/4GeeksAcademy/machine-learning-python-template)[haciendo clic aquí](https://github.com/4GeeksAcademy/machine-learning-python-template/generate).
15
+
2. Abre el repositorio creado recientemente en Codespace usando la [extensión del botón de Codespace](https://docs.github.com/es/codespaces/developing-in-codespaces/creating-a-codespace-for-a-repository#creating-a-codespace-for-a-repository).
16
16
3. Una vez que el VSCode del Codespace haya terminado de abrirse, comienza tu proyecto siguiendo las instrucciones a continuación.
17
17
18
18
## 🚛 Cómo entregar este proyecto
@@ -27,7 +27,13 @@ Este conjunto de datos proviene originalmente del Instituto Nacional de Diabetes
27
27
28
28
#### Paso 1: Carga del conjunto de datos
29
29
30
-
El conjunto de datos se puede encontrar en esta carpeta de proyecto bajo el nombre `diabetes.csv`. Puedes cargarlo en el código directamente desde el enlace (`https://raw.githubusercontent.com/4GeeksAcademy/decision-tree-project-tutorial/main/diabetes.csv`) o descargarlo y añadirlo a mano en tu repositorio. En este conjunto de datos encontrarás las siguientes variables:
30
+
El conjunto de datos se puede encontrar en esta carpeta de proyecto bajo el nombre `diabetes.csv`. Puedes cargarlo en el código directamente desde el siguiente enlace:
> Nota: También incorporamos muestras de solución en `./solution.ipynb` que te sugerimos honestamente que solo uses si estás atascado por más de 30 minutos o si ya has terminado y quieres compararlo con tu enfoque.
- Process it by applying exploratory data analysis (EDA).
7
-
- Model the data using logistic regression.
7
+
- Model the data building a decision tree.
8
8
- Analyze the results and optimize the model.
9
9
10
-
## 🌱 How to start this project
10
+
## 🌱 How to start this project
11
11
12
12
Follow the instructions below:
13
13
14
-
1. Create a new repository based on [machine learning project](https://github.com/4GeeksAcademy/machine-learning-python-template/generate) by [clicking here](https://github.com/4GeeksAcademy/machine-learning-python-template).
14
+
1. Create a new repository based on [machine learning project](https://github.com/4GeeksAcademy/machine-learning-python-template) by [clicking here](https://github.com/4GeeksAcademy/machine-learning-python-template/generate).
15
15
2. Open the newly created repository in Codespace using the [Codespace button extension](https://docs.github.com/en/codespaces/developing-in-codespaces/creating-a-codespace-for-a-repository#creating-a-codespace-for-a-repository).
16
16
3. Once the Codespace VSCode has finished opening, start your project by following the instructions below.
17
17
18
18
## 🚛 How to deliver this project
19
19
20
-
Once you have finished solving the exercises, be sure to commit your changes, push to your repository and go to 4Geeks.com to upload the repository link.
20
+
Once you have finished solving the exercises, be sure to commit your changes, push them to your repository, and go to 4Geeks.com to upload the repository link.
21
21
22
22
## 📝 Instructions
23
23
24
24
### Predicting Diabetes
25
25
26
-
This dataset originally comes from the National Institute of Diabetes and Digestive and Kidney Diseases. The goal is to predict based on diagnostic measures whether or not a patient has diabetes.
26
+
This dataset originally comes from the National Institute of Diabetes and Digestive and Kidney Diseases. The goal is to predict, based on diagnostic measures, whether or not a patient has diabetes.
27
27
28
28
#### Step 1: Loading the dataset
29
29
30
-
The dataset can be found in this project folder under the name `diabetes.csv`. You can load it into the code directly from the link (`https://raw.githubusercontent.com/4GeeksAcademy/decision-tree-project-tutorial/main/diabetes.csv`) or download it and add it by hand in your repository. In this dataset you will find the following variables:
30
+
The dataset can be found in this project folder under the name `diabetes.csv`. You can load it into the code directly from the link:
Or download it and add it by hand in your repository. In this dataset, you will find the following variables:
31
37
32
38
-`Pregnancies`. Number of pregnancies of the patient (numeric)
33
39
-`Glucose`. Plasma glucose concentration 2 hours after an oral glucose tolerance test (numeric)
34
40
-`BloodPressure`. Diastolic blood pressure (measured in mm Hg) (numeric)
35
-
-`SkinThickness`. Triceps skinfold thickness (measured in mm) (numeric)
41
+
-`SkinThickness`. Triceps skin fold thickness (measured in mm) (numeric)
36
42
-`Insulin`. 2-hour serum insulin (measured in mu U/ml) (numeric)
37
43
-`BMI`. Body mass index (numeric)
38
44
-`DiabetesPedigreeFunction`. Diabetes Pedigree Function (numeric)
39
45
-`Age`. Age of patient (numeric)
40
-
-`Outcome`. Class variable (0 or 1), being 0 negative in diabetes and 1, positive (numeric)
46
+
-`Outcome`. Class variable (0 or 1), being 0 negative in diabetes and 1 positive (numeric)
41
47
42
48
#### Step 2: Perform a full EDA
43
49
44
50
This second step is vital to ensure that we keep the variables that are strictly necessary and eliminate those that are not relevant or do not provide information. Use the example Notebook we worked on and adapt it to this use case.
45
51
46
52
Be sure to conveniently divide the data set into `train` and `test` as we have seen in previous lessons.
47
53
48
-
#### Step 3: Build a regression model
54
+
#### Step 3: Build a decision tree
49
55
50
56
Start solving the problem by implementing a decision tree and analyze which of the two types satisfies your needs. Train it and analyze its results. Try modifying the function for calculating the purity of the nodes and use all the available ones. Describe them and analyze your results by graphing them.
51
57
@@ -57,4 +63,4 @@ After training the tree with the different purity functions, it selects the best
> Note: We also incorporated the solution samples on `./solution.ipynb` that we strongly suggest you only use if you are stuck for more than 30 min or if you have already finished and want to compare it with your approach.
0 commit comments