|
19 | 19 | "cell_type": "markdown",
|
20 | 20 | "metadata": {},
|
21 | 21 | "source": [
|
22 |
| - "You have learned about lists as well as how to write your own functions with loops and conditional statements. As such, you can already write programs performing a variety of tasks. \n", |
| 22 | + "You have learned about lists as well, and how to write your own functions with loops and conditional statements. This allows you to write programs performing a variety of tasks. \n", |
23 | 23 | "\n",
|
24 |
| - "However, something that you are currently missing is a mechanism to access the data that you want to analyze. A very common way to access these data is through (local or remotely-stored) [files](https://en.wikipedia.org/wiki/Computer_file)." |
| 24 | + "However, a convenient mechanism to access the data that you want to analyze is currently missing. In this notebook, we will explore the use of [files](https://en.wikipedia.org/wiki/Computer_file) since they are a common way to access stored data." |
25 | 25 | ]
|
26 | 26 | },
|
27 | 27 | {
|
|
30 | 30 | "source": [
|
31 | 31 | "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/key.png\">\n",
|
32 | 32 | "\n",
|
33 |
| - "A **file** provides a mechanism for **permanently storing information** so that they can be retrieved when your program and/or your machine are restarted." |
| 33 | + "A **file** provides a mechanism for **permanently storing information**. Thus, the file content is not lost in the event of a [crash](https://en.wikipedia.org/wiki/Crash_(computing)) or [reboot](https://en.wikipedia.org/wiki/Reboot)." |
34 | 34 | ]
|
35 | 35 | },
|
36 | 36 | {
|
|
55 | 55 | "source": [
|
56 | 56 | "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/key.png\">\n",
|
57 | 57 | "\n",
|
58 |
| - "A **binary file** is any other type of file that does not fit the previous definition of text file." |
| 58 | + "A **binary file** is any other type of file that does not fit the previous definition of a text file." |
59 | 59 | ]
|
60 | 60 | },
|
61 | 61 | {
|
62 | 62 | "cell_type": "markdown",
|
63 | 63 | "metadata": {},
|
64 | 64 | "source": [
|
65 |
| - "You can often recognize a text files by looking at the [file extension](https://en.wikipedia.org/wiki/Filename_extension). Extensions commonly in use for text files are: `.txt`, `.asc`, `.xyz`." |
| 65 | + "You can often recognize a text file by looking at the [file extension](https://en.wikipedia.org/wiki/Filename_extension). Extensions commonly in use for text files are: `.txt`, `.asc`, `.xyz`." |
66 | 66 | ]
|
67 | 67 | },
|
68 | 68 | {
|
|
71 | 71 | "source": [
|
72 | 72 | "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/info.png\">\n",
|
73 | 73 | "\n",
|
74 |
| - "A very simple test to evaluate whether a given file is a text file is to open it in a text editor. If you can understand the visualized content of an opened file, then the file is likely a text file. *(Be warned that opening a file in this way can take a long time depending on the size of the file.)*" |
| 74 | + "A very simple test to evaluate whether a given file is a text file is to open it in a text editor. If you can recognize the visualized content of an opened file as text, then the file is likely a text file. *(Be warned that opening a file in this way can take a long time depending on the size of the file.)*" |
75 | 75 | ]
|
76 | 76 | },
|
77 | 77 | {
|
78 | 78 | "cell_type": "markdown",
|
79 | 79 | "metadata": {},
|
80 | 80 | "source": [
|
81 |
| - "We will first introduce some file managing capability of the `os` [Python module](https://docs.python.org/3.6/tutorial/modules.html#modules), then we will describe the use of the functions that Python provides for [reading and writing the content of a text file](https://docs.python.org/3.6/tutorial/inputoutput.html)." |
| 81 | + "We will first introduce some file managing capabilities of the `os.path` [Python module](https://docs.python.org/3.6/tutorial/modules.html#modules), then we will use the functions that Python provides for [reading and writing the content of a text file](https://docs.python.org/3.6/tutorial/inputoutput.html)." |
82 | 82 | ]
|
83 | 83 | },
|
84 | 84 | {
|
|
87 | 87 | "source": [
|
88 | 88 | "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/key.png\">\n",
|
89 | 89 | "\n",
|
90 |
| - "In Python, a **module** is a file containing definitions and statements. The module name is given by the file name without the suffix `.py`." |
| 90 | + "In Python, a **module** is a file containing definitions and statements. " |
| 91 | + ] |
| 92 | + }, |
| 93 | + { |
| 94 | + "cell_type": "markdown", |
| 95 | + "metadata": {}, |
| 96 | + "source": [ |
| 97 | + "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/info.png\">\n", |
| 98 | + "\n", |
| 99 | + "The module name is given by the file name without the [file extension](https://en.wikipedia.org/wiki/Filename_extension). For example, a file `example.py` may identify the module `example`." |
91 | 100 | ]
|
92 | 101 | },
|
93 | 102 | {
|
|
103 | 112 | "cell_type": "markdown",
|
104 | 113 | "metadata": {},
|
105 | 114 | "source": [
|
106 |
| - "## The `os` module" |
| 115 | + "## The `os.path` module" |
107 | 116 | ]
|
108 | 117 | },
|
109 | 118 | {
|
110 | 119 | "cell_type": "markdown",
|
111 | 120 | "metadata": {},
|
112 | 121 | "source": [
|
113 |
| - "The `os` module provides a **portable** way of using several functionalities [across different operating systems](https://en.wikipedia.org/wiki/Cross-platform_software) (i.e., the same code can run on [Linux Ubuntu](https://en.wikipedia.org/wiki/Ubuntu) and [Microsoft Windows 10](https://en.wikipedia.org/wiki/Windows_10))." |
| 122 | + "We will explore the `os.path` module to retrieve some data files that are stored on the server's hard disk." |
114 | 123 | ]
|
115 | 124 | },
|
116 | 125 | {
|
117 | 126 | "cell_type": "markdown",
|
118 | 127 | "metadata": {},
|
119 | 128 | "source": [
|
120 |
| - "In particular, we will explore the `os.path` sub-module to retrieve some data files that are stored on the server's hard disk.\n", |
| 129 | + "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/info.png\">\n", |
121 | 130 | "\n",
|
122 |
| - "The first required operation is to **import** the `os` module. Then, we will use some of the `os.path` sub-module functionalities and variables to write a function that returns the full path of the folder where this notebook is located:\n", |
| 131 | + "Several functionalities in the `os.path` module are **portable**. This means that can be used [across different operating systems](https://en.wikipedia.org/wiki/Cross-platform_software). For example, you can use its functionalities in code that runs on [Linux Ubuntu](https://en.wikipedia.org/wiki/Ubuntu) and [Microsoft Windows 10](https://en.wikipedia.org/wiki/Windows_10)." |
| 132 | + ] |
| 133 | + }, |
| 134 | + { |
| 135 | + "cell_type": "markdown", |
| 136 | + "metadata": {}, |
| 137 | + "source": [ |
| 138 | + "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/key.png\">\n", |
| 139 | + "\n", |
| 140 | + "For using the `os.path` module, the first required operation is to **import** it. Once imported, the module can be used." |
| 141 | + ] |
| 142 | + }, |
| 143 | + { |
| 144 | + "cell_type": "markdown", |
| 145 | + "metadata": {}, |
| 146 | + "source": [ |
| 147 | + "In the example below, we write a `get_current_folder()` function that returns the path of the folder where this notebook is located." |
| 148 | + ] |
| 149 | + }, |
| 150 | + { |
| 151 | + "cell_type": "markdown", |
| 152 | + "metadata": {}, |
| 153 | + "source": [ |
| 154 | + "To achieve this task, we will use two of the `os.path` functionalities and variables:\n", |
| 155 | + "\n", |
| 156 | + "- `curdir`: The string used by the [operating system](https://en.wikipedia.org/wiki/Operating_system) to refer to the current directory.\n", |
| 157 | + "- `abspath()`: A function that returns the [absolute path](https://en.wikipedia.org/wiki/Path_(computing)#Absolute_and_relative_paths)." |
| 158 | + ] |
| 159 | + }, |
| 160 | + { |
| 161 | + "attachments": {}, |
| 162 | + "cell_type": "markdown", |
| 163 | + "metadata": {}, |
| 164 | + "source": [ |
| 165 | + "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/info.png\">\n", |
123 | 166 | "\n",
|
124 |
| - "- `curdir`: The constant string used by the operating system to refer to the current directory. E.g., `.` for Windows and Linux.\n", |
125 |
| - "- `abspath()`: A function that returns the full, absolute version of a path." |
| 167 | + "An [**absolute path**](https://en.wikipedia.org/wiki/Path_(computing)#Absolute_and_relative_paths) points to the same location in a file system, regardless of the current working directory. In contrast, a [**relative path**](https://en.wikipedia.org/wiki/Path_(computing)#Absolute_and_relative_paths) starts from a given working directory." |
126 | 168 | ]
|
127 | 169 | },
|
128 | 170 | {
|
|
131 | 173 | "metadata": {},
|
132 | 174 | "outputs": [],
|
133 | 175 | "source": [
|
134 |
| - "import os\n", |
| 176 | + "import os.path\n", |
135 | 177 | "\n",
|
136 | 178 | "def get_current_folder():\n",
|
137 | 179 | " cur_folder = os.path.abspath(os.path.curdir)\n",
|
|
153 | 195 | "cell_type": "markdown",
|
154 | 196 | "metadata": {},
|
155 | 197 | "source": [
|
156 |
| - "As such, we extend the previous code using `os.path.join()` and `os.path.exist()` functions to:\n", |
| 198 | + "To be able to access the `data` sub-folder, we extend the previous code using `os.path.join()` and `os.path.exist()` functions to:\n", |
157 | 199 | "\n",
|
158 |
| - "- Create the full path to the `data` sub-folder.\n", |
| 200 | + "- Create the absolute path to the `data` sub-folder.\n", |
159 | 201 | "- Check whether the resulting path actually exists."
|
160 | 202 | ]
|
161 | 203 | },
|
|
189 | 231 | "source": [
|
190 | 232 | "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/key.png\">\n",
|
191 | 233 | "\n",
|
192 |
| - "We did not import the `os` module since it was already imported in the previous cell. Re-importing a module does not break your code, but makes it more verbose. " |
| 234 | + "We did not import the `os.path` module since it was already imported in the previous code cell. Re-importing a module does not break your code, but makes it more verbose. " |
| 235 | + ] |
| 236 | + }, |
| 237 | + { |
| 238 | + "cell_type": "markdown", |
| 239 | + "metadata": {}, |
| 240 | + "source": [ |
| 241 | + "However, if you decide to [clear the results of this notebook](000_Welcome_on_Board.ipynb#How-to-Clear-the-Results-of-a-Notebook?), you will need to re-execute the code cell with the `import` statement." |
193 | 242 | ]
|
194 | 243 | },
|
195 | 244 | {
|
|
298 | 347 | "cell_type": "markdown",
|
299 | 348 | "metadata": {},
|
300 | 349 | "source": [
|
301 |
| - "As discussed above, a text file is a sequence of characters stored on a permanent medium (e.g., a flash memory)." |
| 350 | + "As discussed above, a text file is a sequence of characters stored on a permanent medium (e.g., a [USB flash drive](https://en.wikipedia.org/wiki/USB_flash_drive))." |
302 | 351 | ]
|
303 | 352 | },
|
304 | 353 | {
|
|
388 | 437 | "cell_type": "markdown",
|
389 | 438 | "metadata": {},
|
390 | 439 | "source": [
|
391 |
| - "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/info.png\">\n", |
| 440 | + "Why the characters are 100? There are 20 rows in the file. Each row has 4 visible characters (e.g., `30.8`) but there is also an invisible [newline character](https://en.wikipedia.org/wiki/Newline) (i.e., `\\n`) that text editors interpret as a new line. Thus, `(4+1) * 20 = 100` characters." |
| 441 | + ] |
| 442 | + }, |
| 443 | + { |
| 444 | + "cell_type": "markdown", |
| 445 | + "metadata": {}, |
| 446 | + "source": [ |
| 447 | + "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/key.png\">\n", |
392 | 448 | "\n",
|
393 |
| - "Why the characters are 100? Each row has 4 visible characters (e.g., `30.8`) but there is also an invisible character (i.e., `\\n`) that the text editor interprets as a new line. Thus, `(4+1) * 20 = 100` characters." |
| 449 | + "The **newline character** is used to control the end of a line of text and the start of a new one." |
| 450 | + ] |
| 451 | + }, |
| 452 | + { |
| 453 | + "cell_type": "markdown", |
| 454 | + "metadata": {}, |
| 455 | + "source": [ |
| 456 | + "In the code above, the `sal_content` variable holds the content of the file as a single sequence of characters." |
394 | 457 | ]
|
395 | 458 | },
|
396 | 459 | {
|
397 | 460 | "cell_type": "markdown",
|
398 | 461 | "metadata": {},
|
399 | 462 | "source": [
|
400 |
| - "We will now write a function that not only reads the sequence of characters, but also splits them by line (using the `str` method named `splitlines()`) and converts the result in the corresponding `float` value." |
| 463 | + "We will now write a function that not only reads the sequence of characters, but also splits them in multiple lines based on the **newline character** (using the `str` method named `splitlines()`). Finally, we convert the result in the corresponding `float` value and append this value to `sal_list`." |
401 | 464 | ]
|
402 | 465 | },
|
403 | 466 | {
|
|
503 | 566 | "cell_type": "markdown",
|
504 | 567 | "metadata": {},
|
505 | 568 | "source": [
|
506 |
| - "If you want to write a text file, the first decision to take is the location on where to store the text file. For this collection of notebook, we will use the `output` sub-folder that can be retrieved running the following code:" |
| 569 | + "If you want to write a text file, you need to decide where to store it. For this collection of notebooks, we will use the `output` sub-folder that can be retrieved running the following code:" |
507 | 570 | ]
|
508 | 571 | },
|
509 | 572 | {
|
|
513 | 576 | "outputs": [],
|
514 | 577 | "source": [
|
515 | 578 | "def get_output_folder():\n",
|
516 |
| - " cur_folder = os.path.abspath(os.path.curdir)\n", |
517 |
| - " output_folder = os.path.join(cur_folder, \"output\")\n", |
| 579 | + " cur_folder = os.path.abspath(os.path.curdir) # The absolute path to the current directory\n", |
| 580 | + " output_folder = os.path.join(cur_folder, \"output\") # The absolute path to the output folder (may or may not exist)\n", |
518 | 581 | " if os.path.exists(output_folder):\n",
|
519 | 582 | " return output_folder\n",
|
520 | 583 | " else: # in case that the output folder does not exists, we raise a meaningful error\n",
|
|
545 | 608 | "cell_type": "markdown",
|
546 | 609 | "metadata": {},
|
547 | 610 | "source": [
|
548 |
| - "To write a file, you have to use the `open()` function and pass the `w` mode (`w` is for *write*) as second parameter. We put this function within a function that take a list as a second parameter and write the content into the text file." |
| 611 | + "In the code below, the `write_list_to_disk` function takes:\n", |
| 612 | + "\n", |
| 613 | + "* An `output_path` where the output file is to be written. \n", |
| 614 | + "* A `input_list` containing the data to be written in the output file." |
| 615 | + ] |
| 616 | + }, |
| 617 | + { |
| 618 | + "cell_type": "markdown", |
| 619 | + "metadata": {}, |
| 620 | + "source": [ |
| 621 | + "The below function uses the `open()` function and passes the `w` mode (`w` is for *write*) as second parameter. " |
| 622 | + ] |
| 623 | + }, |
| 624 | + { |
| 625 | + "cell_type": "markdown", |
| 626 | + "metadata": {}, |
| 627 | + "source": [ |
| 628 | + "<img align=\"left\" width=\"6%\" style=\"padding-right:10px;\" src=\"images/info.png\">\n", |
| 629 | + "\n", |
| 630 | + "You may learn about other modes for opening a file from the official [Python documentation](https://docs.python.org/3.6/library/functions.html?#open)." |
549 | 631 | ]
|
550 | 632 | },
|
551 | 633 | {
|
|
556 | 638 | "source": [
|
557 | 639 | "def write_list_to_disk(output_path, input_list):\n",
|
558 | 640 | " \n",
|
559 |
| - " output_file = open(output_path, mode=\"w\")\n", |
| 641 | + " output_file = open(output_path, mode=\"w\") # mode=\"w\" to open the file in writing mode\n", |
560 | 642 | " \n",
|
561 | 643 | " for value in input_list:\n",
|
562 |
| - " line_content = str(value) + \"\\n\" # the \"\\n\" is the 'escaped' character for the new line\n", |
| 644 | + " line_content = str(value) + \"\\n\" # the \"\\n\" is the newline character\n", |
563 | 645 | " output_file.write(line_content)\n",
|
564 | 646 | " \n",
|
565 | 647 | " output_file.close()\n",
|
|
669 | 751 | "* [Computer file](https://en.wikipedia.org/wiki/Computer_file)\n",
|
670 | 752 | " * [Text file](https://en.wikipedia.org/wiki/Text_file)\n",
|
671 | 753 | " * [Binary file](https://en.wikipedia.org/wiki/Binary_file)\n",
|
672 |
| - " * [Filename extension](https://en.wikipedia.org/wiki/Filename_extension)" |
| 754 | + " * [Filename extension](https://en.wikipedia.org/wiki/Filename_extension)\n", |
| 755 | + "* [Absolute and relative paths](https://en.wikipedia.org/wiki/Path_(computing)#Absolute_and_relative_paths)" |
673 | 756 | ]
|
674 | 757 | },
|
675 | 758 | {
|
|
0 commit comments