Merge pull request #7 from josemoracard/jose1-README

alesanchezr · web-flow · commit d80c9589d6a6 · 2024-05-07T16:16:18.000-04:00
fixed text README.md, learn.json, solution.ipynb
diff --git a/README.es.md b/README.es.md
@@ -4,15 +4,15 @@
 
 - Comprender un dataset nuevo.
 - Procesarlo aplicando un análisis exploratorio (EDA).
-- Modelar los datos utilizando la regresión lineal regularizada.
+- Modelar los datos construyendo un árbol de decisión.
 - Analizar los resultados y optimizar el modelo.
 
-## 🌱  Cómo iniciar este proyecto
+## 🌱 Cómo iniciar este proyecto
 
 Sigue las siguientes instrucciones:
 
-1. Crea un nuevo repositorio basado en el [proyecto de Machine Learing](https://github.com/4GeeksAcademy/machine-learning-python-template/generate) [haciendo clic aquí](https://github.com/4GeeksAcademy/machine-learning-python-template).
-2. Abre el repositorio creado recientemente en Codespace usando la [extensión del botón de Codespace](https://docs.github.com/en/codespaces/developing-in-codespaces/creating-a-codespace-for-a-repository#creating-a-codespace-for-a-repository).
+1. Crea un nuevo repositorio basado en el [proyecto de Machine Learning](https://github.com/4GeeksAcademy/machine-learning-python-template) [haciendo clic aquí](https://github.com/4GeeksAcademy/machine-learning-python-template/generate).
+2. Abre el repositorio creado recientemente en Codespace usando la [extensión del botón de Codespace](https://docs.github.com/es/codespaces/developing-in-codespaces/creating-a-codespace-for-a-repository#creating-a-codespace-for-a-repository).
 3. Una vez que el VSCode del Codespace haya terminado de abrirse, comienza tu proyecto siguiendo las instrucciones a continuación.
 
 ## 🚛 Cómo entregar este proyecto
@@ -27,7 +27,13 @@ Este conjunto de datos proviene originalmente del Instituto Nacional de Diabetes
 
 #### Paso 1: Carga del conjunto de datos
 
-El conjunto de datos se puede encontrar en esta carpeta de proyecto bajo el nombre `diabetes.csv`. Puedes cargarlo en el código directamente desde el enlace (`https://raw.githubusercontent.com/4GeeksAcademy/decision-tree-project-tutorial/main/diabetes.csv`) o descargarlo y añadirlo a mano en tu repositorio. En este conjunto de datos encontrarás las siguientes variables:
+El conjunto de datos se puede encontrar en esta carpeta de proyecto bajo el nombre `diabetes.csv`. Puedes cargarlo en el código directamente desde el siguiente enlace:
+
+```text
+https://raw.githubusercontent.com/4GeeksAcademy/decision-tree-project-tutorial/main/diabetes.csv
+```
+
+O descargarlo y añadirlo a mano en tu repositorio. En este conjunto de datos encontrarás las siguientes variables:
 
 - `Pregnancies`. Número de embarazos del paciente (numérico)
 - `Glucose`. Concentración de glucosa en plasma a las 2 horas de un test de tolerancia oral a la glucosa (numérico)
@@ -57,4 +63,4 @@ Después de entrenar el árbol con las distintas funciones de pureza, selecciona
 
 Almacena el modelo en la carpeta correspondiente.
 
-> NOTA: Solución: https://github.com/4GeeksAcademy/decision-tree-project-tutorial/blob/main/solution.ipynb
+> Nota: También incorporamos muestras de solución en `./solution.ipynb` que te sugerimos honestamente que solo uses si estás atascado por más de 30 minutos o si ya has terminado y quieres compararlo con tu enfoque.
diff --git a/README.md b/README.md
@@ -1,51 +1,57 @@
 <!-- hide -->
-# Decision trees - Steep by steep guide
+# Decision trees - Step by step guide
 <!-- endhide -->
 
 - Understand a new dataset.
 - Process it by applying exploratory data analysis (EDA).
-- Model the data using logistic regression.
+- Model the data building a decision tree.
 - Analyze the results and optimize the model.
 
-## 🌱  How to start this project
+## 🌱 How to start this project
 
 Follow the instructions below:
 
-1. Create a new repository based on [machine learning project](https://github.com/4GeeksAcademy/machine-learning-python-template/generate) by [clicking here](https://github.com/4GeeksAcademy/machine-learning-python-template).
+1. Create a new repository based on [machine learning project](https://github.com/4GeeksAcademy/machine-learning-python-template) by [clicking here](https://github.com/4GeeksAcademy/machine-learning-python-template/generate).
 2. Open the newly created repository in Codespace using the [Codespace button extension](https://docs.github.com/en/codespaces/developing-in-codespaces/creating-a-codespace-for-a-repository#creating-a-codespace-for-a-repository).
 3. Once the Codespace VSCode has finished opening, start your project by following the instructions below.
 
 ## 🚛 How to deliver this project
 
-Once you have finished solving the exercises, be sure to commit your changes, push to your repository and go to 4Geeks.com to upload the repository link.
+Once you have finished solving the exercises, be sure to commit your changes, push them to your repository, and go to 4Geeks.com to upload the repository link.
 
 ## 📝 Instructions
 
 ### Predicting Diabetes
 
-This dataset originally comes from the National Institute of Diabetes and Digestive and Kidney Diseases. The goal is to predict based on diagnostic measures whether or not a patient has diabetes.
+This dataset originally comes from the National Institute of Diabetes and Digestive and Kidney Diseases. The goal is to predict, based on diagnostic measures, whether or not a patient has diabetes.
 
 #### Step 1: Loading the dataset
 
-The dataset can be found in this project folder under the name `diabetes.csv`. You can load it into the code directly from the link (`https://raw.githubusercontent.com/4GeeksAcademy/decision-tree-project-tutorial/main/diabetes.csv`) or download it and add it by hand in your repository. In this dataset you will find the following variables:
+The dataset can be found in this project folder under the name `diabetes.csv`. You can load it into the code directly from the link: 
+
+```text
+https://raw.githubusercontent.com/4GeeksAcademy/decision-tree-project-tutorial/main/diabetes.csv
+```
+
+Or download it and add it by hand in your repository. In this dataset, you will find the following variables:
 
 - `Pregnancies`. Number of pregnancies of the patient (numeric)
 - `Glucose`. Plasma glucose concentration 2 hours after an oral glucose tolerance test (numeric)
 - `BloodPressure`. Diastolic blood pressure (measured in mm Hg) (numeric)
-- `SkinThickness`. Triceps skinfold thickness (measured in mm) (numeric)
+- `SkinThickness`. Triceps skin fold thickness (measured in mm) (numeric)
 - `Insulin`. 2-hour serum insulin (measured in mu U/ml) (numeric)
 - `BMI`. Body mass index (numeric)
 - `DiabetesPedigreeFunction`. Diabetes Pedigree Function (numeric)
 - `Age`. Age of patient (numeric)
-- `Outcome`. Class variable (0 or 1), being 0 negative in diabetes and 1, positive (numeric)
+- `Outcome`. Class variable (0 or 1), being 0 negative in diabetes and 1 positive (numeric)
 
 #### Step 2: Perform a full EDA
 
 This second step is vital to ensure that we keep the variables that are strictly necessary and eliminate those that are not relevant or do not provide information. Use the example Notebook we worked on and adapt it to this use case.
 
 Be sure to conveniently divide the data set into `train` and `test` as we have seen in previous lessons.
 
-#### Step 3: Build a regression model
+#### Step 3: Build a decision tree
 
 Start solving the problem by implementing a decision tree and analyze which of the two types satisfies your needs. Train it and analyze its results. Try modifying the function for calculating the purity of the nodes and use all the available ones. Describe them and analyze your results by graphing them.
 
@@ -57,4 +63,4 @@ After training the tree with the different purity functions, it selects the best
 
 Store the model in the corresponding folder.
 
-> NOTE: Solution: https://github.com/4GeeksAcademy/decision-tree-project-tutorial/blob/main/solution.ipynb
+> Note: We also incorporated the solution samples on `./solution.ipynb` that we strongly suggest you only use if you are stuck for more than 30 min or if you have already finished and want to compare it with your approach.
diff --git a/learn.json b/learn.json
@@ -10,5 +10,5 @@
 	"syntax": "python",
 	"duration" : 2,
     "projectType": "project",
-	"description" : "Use decision tree algorithm to diagnose diabetes by using patiente medical information from previous medical exams"
+	"description" : "Use decision tree algorithm to diagnose diabetes by using patients medical information from previous medical exams"
 }
diff --git a/solution.ipynb b/solution.ipynb
@@ -386,7 +386,7 @@
                 }
             ],
             "source": [
-                "# There is not required a normalization process for the variables to train this model\n",
+                "# No normalization of variables is needed for training this model\n",
                 "\n",
                 "# Feature selection\n",
                 "\n",

Original file line number	Diff line number	Diff line change
`@@ -10,5 +10,5 @@`
`10`	`10`	`"syntax": "python",`
`11`	`11`	`"duration" : 2,`
`12`	`12`	`"projectType": "project",`
`13`		`- "description" : "Use decision tree algorithm to diagnose diabetes by using patiente medical information from previous medical exams"`
	`13`	`+ "description" : "Use decision tree algorithm to diagnose diabetes by using patients medical information from previous medical exams"`
`14`	`14`	`}`
Original file line number	Diff line number	Diff line change
`@@ -386,7 +386,7 @@`
`386`	`386`	`}`
`387`	`387`	`],`
`388`	`388`	`"source": [`
`389`		`- "# There is not required a normalization process for the variables to train this model\n",`
	`389`	`+ "# No normalization of variables is needed for training this model\n",`
`390`	`390`	`"\n",`
`391`	`391`	`"# Feature selection\n",`
`392`	`392`	`"\n",`