Skip to content

Commit d80c958

Browse files
authored
Merge pull request #7 from josemoracard/jose1-README
fixed text README.md, learn.json, solution.ipynb
2 parents 5e9e853 + 192afbc commit d80c958

File tree

4 files changed

+31
-19
lines changed

4 files changed

+31
-19
lines changed

README.es.md

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,15 @@
44

55
- Comprender un dataset nuevo.
66
- Procesarlo aplicando un análisis exploratorio (EDA).
7-
- Modelar los datos utilizando la regresión lineal regularizada.
7+
- Modelar los datos construyendo un árbol de decisión.
88
- Analizar los resultados y optimizar el modelo.
99

10-
## 🌱 Cómo iniciar este proyecto
10+
## 🌱 Cómo iniciar este proyecto
1111

1212
Sigue las siguientes instrucciones:
1313

14-
1. Crea un nuevo repositorio basado en el [proyecto de Machine Learing](https://github.com/4GeeksAcademy/machine-learning-python-template/generate) [haciendo clic aquí](https://github.com/4GeeksAcademy/machine-learning-python-template).
15-
2. Abre el repositorio creado recientemente en Codespace usando la [extensión del botón de Codespace](https://docs.github.com/en/codespaces/developing-in-codespaces/creating-a-codespace-for-a-repository#creating-a-codespace-for-a-repository).
14+
1. Crea un nuevo repositorio basado en el [proyecto de Machine Learning](https://github.com/4GeeksAcademy/machine-learning-python-template) [haciendo clic aquí](https://github.com/4GeeksAcademy/machine-learning-python-template/generate).
15+
2. Abre el repositorio creado recientemente en Codespace usando la [extensión del botón de Codespace](https://docs.github.com/es/codespaces/developing-in-codespaces/creating-a-codespace-for-a-repository#creating-a-codespace-for-a-repository).
1616
3. Una vez que el VSCode del Codespace haya terminado de abrirse, comienza tu proyecto siguiendo las instrucciones a continuación.
1717

1818
## 🚛 Cómo entregar este proyecto
@@ -27,7 +27,13 @@ Este conjunto de datos proviene originalmente del Instituto Nacional de Diabetes
2727

2828
#### Paso 1: Carga del conjunto de datos
2929

30-
El conjunto de datos se puede encontrar en esta carpeta de proyecto bajo el nombre `diabetes.csv`. Puedes cargarlo en el código directamente desde el enlace (`https://raw.githubusercontent.com/4GeeksAcademy/decision-tree-project-tutorial/main/diabetes.csv`) o descargarlo y añadirlo a mano en tu repositorio. En este conjunto de datos encontrarás las siguientes variables:
30+
El conjunto de datos se puede encontrar en esta carpeta de proyecto bajo el nombre `diabetes.csv`. Puedes cargarlo en el código directamente desde el siguiente enlace:
31+
32+
```text
33+
https://raw.githubusercontent.com/4GeeksAcademy/decision-tree-project-tutorial/main/diabetes.csv
34+
```
35+
36+
O descargarlo y añadirlo a mano en tu repositorio. En este conjunto de datos encontrarás las siguientes variables:
3137

3238
- `Pregnancies`. Número de embarazos del paciente (numérico)
3339
- `Glucose`. Concentración de glucosa en plasma a las 2 horas de un test de tolerancia oral a la glucosa (numérico)
@@ -57,4 +63,4 @@ Después de entrenar el árbol con las distintas funciones de pureza, selecciona
5763

5864
Almacena el modelo en la carpeta correspondiente.
5965

60-
> NOTA: Solución: https://github.com/4GeeksAcademy/decision-tree-project-tutorial/blob/main/solution.ipynb
66+
> Nota: También incorporamos muestras de solución en `./solution.ipynb` que te sugerimos honestamente que solo uses si estás atascado por más de 30 minutos o si ya has terminado y quieres compararlo con tu enfoque.

README.md

Lines changed: 17 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,51 +1,57 @@
11
<!-- hide -->
2-
# Decision trees - Steep by steep guide
2+
# Decision trees - Step by step guide
33
<!-- endhide -->
44

55
- Understand a new dataset.
66
- Process it by applying exploratory data analysis (EDA).
7-
- Model the data using logistic regression.
7+
- Model the data building a decision tree.
88
- Analyze the results and optimize the model.
99

10-
## 🌱 How to start this project
10+
## 🌱 How to start this project
1111

1212
Follow the instructions below:
1313

14-
1. Create a new repository based on [machine learning project](https://github.com/4GeeksAcademy/machine-learning-python-template/generate) by [clicking here](https://github.com/4GeeksAcademy/machine-learning-python-template).
14+
1. Create a new repository based on [machine learning project](https://github.com/4GeeksAcademy/machine-learning-python-template) by [clicking here](https://github.com/4GeeksAcademy/machine-learning-python-template/generate).
1515
2. Open the newly created repository in Codespace using the [Codespace button extension](https://docs.github.com/en/codespaces/developing-in-codespaces/creating-a-codespace-for-a-repository#creating-a-codespace-for-a-repository).
1616
3. Once the Codespace VSCode has finished opening, start your project by following the instructions below.
1717

1818
## 🚛 How to deliver this project
1919

20-
Once you have finished solving the exercises, be sure to commit your changes, push to your repository and go to 4Geeks.com to upload the repository link.
20+
Once you have finished solving the exercises, be sure to commit your changes, push them to your repository, and go to 4Geeks.com to upload the repository link.
2121

2222
## 📝 Instructions
2323

2424
### Predicting Diabetes
2525

26-
This dataset originally comes from the National Institute of Diabetes and Digestive and Kidney Diseases. The goal is to predict based on diagnostic measures whether or not a patient has diabetes.
26+
This dataset originally comes from the National Institute of Diabetes and Digestive and Kidney Diseases. The goal is to predict, based on diagnostic measures, whether or not a patient has diabetes.
2727

2828
#### Step 1: Loading the dataset
2929

30-
The dataset can be found in this project folder under the name `diabetes.csv`. You can load it into the code directly from the link (`https://raw.githubusercontent.com/4GeeksAcademy/decision-tree-project-tutorial/main/diabetes.csv`) or download it and add it by hand in your repository. In this dataset you will find the following variables:
30+
The dataset can be found in this project folder under the name `diabetes.csv`. You can load it into the code directly from the link:
31+
32+
```text
33+
https://raw.githubusercontent.com/4GeeksAcademy/decision-tree-project-tutorial/main/diabetes.csv
34+
```
35+
36+
Or download it and add it by hand in your repository. In this dataset, you will find the following variables:
3137

3238
- `Pregnancies`. Number of pregnancies of the patient (numeric)
3339
- `Glucose`. Plasma glucose concentration 2 hours after an oral glucose tolerance test (numeric)
3440
- `BloodPressure`. Diastolic blood pressure (measured in mm Hg) (numeric)
35-
- `SkinThickness`. Triceps skinfold thickness (measured in mm) (numeric)
41+
- `SkinThickness`. Triceps skin fold thickness (measured in mm) (numeric)
3642
- `Insulin`. 2-hour serum insulin (measured in mu U/ml) (numeric)
3743
- `BMI`. Body mass index (numeric)
3844
- `DiabetesPedigreeFunction`. Diabetes Pedigree Function (numeric)
3945
- `Age`. Age of patient (numeric)
40-
- `Outcome`. Class variable (0 or 1), being 0 negative in diabetes and 1, positive (numeric)
46+
- `Outcome`. Class variable (0 or 1), being 0 negative in diabetes and 1 positive (numeric)
4147

4248
#### Step 2: Perform a full EDA
4349

4450
This second step is vital to ensure that we keep the variables that are strictly necessary and eliminate those that are not relevant or do not provide information. Use the example Notebook we worked on and adapt it to this use case.
4551

4652
Be sure to conveniently divide the data set into `train` and `test` as we have seen in previous lessons.
4753

48-
#### Step 3: Build a regression model
54+
#### Step 3: Build a decision tree
4955

5056
Start solving the problem by implementing a decision tree and analyze which of the two types satisfies your needs. Train it and analyze its results. Try modifying the function for calculating the purity of the nodes and use all the available ones. Describe them and analyze your results by graphing them.
5157

@@ -57,4 +63,4 @@ After training the tree with the different purity functions, it selects the best
5763

5864
Store the model in the corresponding folder.
5965

60-
> NOTE: Solution: https://github.com/4GeeksAcademy/decision-tree-project-tutorial/blob/main/solution.ipynb
66+
> Note: We also incorporated the solution samples on `./solution.ipynb` that we strongly suggest you only use if you are stuck for more than 30 min or if you have already finished and want to compare it with your approach.

learn.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,5 +10,5 @@
1010
"syntax": "python",
1111
"duration" : 2,
1212
"projectType": "project",
13-
"description" : "Use decision tree algorithm to diagnose diabetes by using patiente medical information from previous medical exams"
13+
"description" : "Use decision tree algorithm to diagnose diabetes by using patients medical information from previous medical exams"
1414
}

solution.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -386,7 +386,7 @@
386386
}
387387
],
388388
"source": [
389-
"# There is not required a normalization process for the variables to train this model\n",
389+
"# No normalization of variables is needed for training this model\n",
390390
"\n",
391391
"# Feature selection\n",
392392
"\n",

0 commit comments

Comments
 (0)