|
4 | 4 | "cell_type": "markdown",
|
5 | 5 | "metadata": {},
|
6 | 6 | "source": [
|
7 |
| - "# Pandas Basics\n", |
| 7 | + "# pandas Basics\n", |
8 | 8 | "\n",
|
9 |
| - "[*Pandas*](http://pandas.pydata.org/) is a popular data science tool for managing, manipulating and analyzing column-oriented data." |
| 9 | + "[*pandas*](http://pandas.pydata.org/) is a popular data science tool for managing, manipulating and analyzing column-oriented data." |
10 | 10 | ]
|
11 | 11 | },
|
12 | 12 | {
|
|
24 | 24 | "source": [
|
25 | 25 | "In this notebook, we will cover:\n",
|
26 | 26 | "\n",
|
27 |
| - "- Pandas data structures\n", |
| 27 | + "- pandas data structures\n", |
28 | 28 | "- Description of the data\n",
|
29 | 29 | "- Loading data from CSV\n",
|
30 | 30 | "- Inspecting the data\n",
|
31 | 31 | "- Data selection and filtering\n",
|
32 | 32 | "- Data transformation\n",
|
33 | 33 | "- Sorting values\n",
|
34 | 34 | "\n",
|
35 |
| - "## 1. Pandas data structures\n", |
| 35 | + "## 1. pandas data structures\n", |
36 | 36 | "\n",
|
37 | 37 | "We'll work with two main [data structures](https://en.wikipedia.org/wiki/Data_structure) offered by pandas:\n",
|
38 | 38 | "\n",
|
|
41 | 41 | "\n",
|
42 | 42 | "You can create `Series` and `DataFrame` objects by passing the desired values explicitly, e.g. passing a list for a `Series` and a dictionary in the format `{'column': [list of values]}` for a `DataFrame` .\n",
|
43 | 43 | "\n",
|
44 |
| - "Pandas also offers some helper functions to load data from specific formats, e.g. `read_csv()`.\n", |
| 44 | + "pandas also offers some helper functions to load data from specific formats, e.g. `read_csv()`.\n", |
45 | 45 | "\n",
|
46 | 46 | "\n",
|
47 | 47 | "## 2. Description of the data\n",
|
|
86 | 86 | "\n",
|
87 | 87 | "The function `pandas.read_csv()` can be used to load a CSV file into a pandas `DataFrame`.\n",
|
88 | 88 | "\n",
|
89 |
| - "Pandas also supports other file formats out of the box: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html" |
| 89 | + "pandas also supports other file formats out of the box: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html" |
90 | 90 | ]
|
91 | 91 | },
|
92 | 92 | {
|
|
237 | 237 | "cell_type": "markdown",
|
238 | 238 | "metadata": {},
|
239 | 239 | "source": [
|
240 |
| - "You may notice from the result of running `info()` above that the [dtype](https://pbpython.com/pandas_dtypes.html) of the date column is an object instead of a datetime. This is because Pandas thinks it is a string instead of a date. This is bad not just from an efficiency standpoint, and also because you wouldn't be able to do datetime operations on that column (eg: comparing if it's earlier or later vs another date). It is good practice to process the data to fix this as soon as you load the data!\n", |
| 240 | + "You may notice from the result of running `info()` above that the [dtype](https://pbpython.com/pandas_dtypes.html) of the date column is an object instead of a datetime. This is because pandas thinks it is a string instead of a date. This is bad not just from an efficiency standpoint, and also because you wouldn't be able to do datetime operations on that column (eg: comparing if it's earlier or later vs another date). It is good practice to process the data and fix this as early on as possible!\n", |
241 | 241 | "\n",
|
242 | 242 | "#### Exercise\n",
|
243 | 243 | "\n",
|
|
0 commit comments