Skip to content

File Input and Output

Anthony Graca edited this page Dec 31, 2017 · 11 revisions

Overview

Variables do not stay after your program is done. Files can help save data after your program is finished. A file is just a large string.

New Concepts

File Paths and File Extensions

A file can be referred to in two ways, its filename and its path. The path is the location of the file while the filename is the name of the file. For example, if I was the US President and I had the nuclear launch codes, the path would probably be something like;

C:\Users\Trump\Documents\nuclear_launch_codes.txt

In this case, the filename of the launch codes is "nuclear_launch_codes.txt" and the "Users", "Trump", and "Documents" are known as folders or directories. the C:\ in Windows is called the root folder. Root means the base or the lowest you can go, like a plant. The root folder contains everything. In Linux or macOS, the root folder is /. Windows and Unix-based operating systems deal with files different. Feel free to ask questions.

As a note, USB Drives are represented differently in Windows and Unix-based operating systems. In Windows, USB Drives are shown with an entirely different root folder like D:\ or E:. In Linux and macOS, the drives are attached to the root folder, sometimes under mnt.

(Show Tree structure of a folder hierarchy)

Backslash on Windows and Forward Slash on macOS and Linux

Another difference is that Windows paths are written with backslashes (). As macOS and Linux uses forward slashes (/). To make python work on all OS-es, you will need to handle both cases, which will be shown further down.

A helpful function is os.path.join(), it will return a string with a file path using the correct path seperators

  >> import os
  >> path = os.path.join('usr', 'bin', 'spam')
  'usr\\bin\\spam' # on windoge
  'usr/bin/spam    # on best OS

You need double back slashes when using Windows because you need to use an escape character smh

Current Working Directory (directory is another name for folder)

Follow along with the Python interactive shell. You may or may not get the same output. Do not worry, we are more concerned with you trying out the functions so you can see what they do.

os.getcwd() - gets current working directory

os.chdir(path) - changes the directory

  >> import os
  >> os.getcwd()
  'C:\\Python34'
  >> os.chdir('C:\\Windows\\System32')
  >> os.getcwd()
  'C:\\Windows\\System32'
  >> os.chdir('C:\\RandomAFLocation')

  FileNotFoundError

Absolute vs. Relative Paths

An absolute path is a path that starts from root. A relative starts from working directory.

For example, C:\Users\Trump\Documents\nuclear_launch_codes.txt is the absolute path. While if I am in the Trump folder, .\Documents\nuclear_launch_codes.txt is the relative path. We can try this out in the windows command line.

Show graphic/example to compare and contrast

There are two shortcuts to represent this directory and the parent folder. We will be using these a lot.

. (dot) for this directory

.. (dot-dot) for parent folder

Creating New Folders

Use os.makedirs(path) to create a folder

The os.path Module

Whenever we are working with paths, we will be using the os.path module. To import this, we only need to type in "import os". This module gives us a few additional functions that can help us. Try to follow along with the Python interactive shell

Handling Abssolute and Relative path

os.path.abspath(path) - returns a string with the absolute path of the argument

  >> os.path.abspath('.')

os.path.isabs(path) - returns True if argument is an absolute path

  >> os.path.isabs('.') # Why is this sometimes false?
  False
  >> os.path.isabs(os.path.abspath('.')) # Why is this true?
  True

os.path.relpath(path, start) - returns a string of a relative path from the start to path. If a start is not given, the current working directory is used

  >> os.path.relpath('C:\\Windows', 'C:\\')
  'Windows'

os.path.dirname(path) will return a string of everything that comes before the last slash os.path.basename(path) will return a string of everything that comes after the last slash

  >> path = 'C:\\Windows\\System32\\calc.exe'
  >> os.path.basename(path)
  'calc.exe'
  >> os.path.dirname(path)
  'C:\\Windows\\System32'

os.path.split(path) takes a path and returns a tuple of the base + dirname

  >> path = 'C:\\Windows\\System32\\calc.exe'
  >> os.path.split(path)
  ('C:\\Windows\\System32', 'calc.exe')

Finding File Sizes and Folder contents

os.path.getsize(path) returns the size in bytes of the file os.listdir(path) returns a list of strings for each file in the path

>> totalSize = 0
>> # What does this for-loop do?
>> for filename in os.listdir('.')
     totalSize = totalSize + os.path.getsize(os.path.join('.', filename))
>> print(totalSize) 

Checking path Validity

Python will crash if you provide a path that does not exist. Luckily, the os.path modules has some functions for us to prevent that.

os.path.exists(path) - returns True if the file or folder exists and returns False if it doesn't. os.path.isfile(path) - returns True if the path points to a file and returns False if it doesn't. os.path.isdir(path) - returns True if the path points to a folder and returns False if it doesn't.

Reading and Writing Files

We need to be comfortable with all the functions above so we can tell Python exactly the location of the files we want to read and write. The files we will be working on are plaintext files, which contain only normal human-readable text. Examples of plaintext files are files that end with .txt, or .py. We will not be working with binary files which are only computer readable files. If we tried to open a binary file to read, we will get garbled text.

We will be using three functions to read plain text files,

open() - returns a File object. Passing 'w' allows you to write into the file. Passing 'a' allows us to append into the object. We are going to be calling the next few methods on our file object.

read() or write() - reads from or writes to the file

close() - closes the File object which allows Python to save the file

Let's create our own files with python. Follow along with the Python interactive shell. Our code will work with any operating system.

>> import os
>> path = os.path.join('.', 'newFile.txt')  # Why does this work everytime?
>> myFile = open(path, 'w')
>> myFile.write('Hello [insert name]\n')
>> myFile.close()

>> helloFile = open(path, 'r')
>> print(helloFile.read())
Hello [insert name]
>> helloFile.close()

>> editFile = open(path, 'a') #if we just passed 'w', we would overwrite our data
>> editFile = write('I like [insert food]')
>> editFile = close()

>> readFile = open(path)
>> print(readFile.read())
Hello [insertname]
I like [insert food]
>> readFile.close()

Activity

You are a geography teacher with 35 students in your class and you want to give a quiz on US state capitals. The point of this quiz is that you want to know generally where students are at on the material so you know whether you should advance to newer material or you should spend more time on review to make sure everyone is caught up. However, you were a high schooler before and you know some students will cheat off of each other as it's not hard to do so in a geography test. This means it's hard to get a good read on how the class is doing

Luckily, you know some Python programming and you just learned how to do file input and output. You decided to randomize the order of questions so that each quiz is unique, making it impossible for a student to copy off of another.

This is what the program does:

  • Creates 35 different quizzes
  • Creates 50 multiple-choice questions for each quiz, in random order.
  • Provides the correct answer and three random wrong answers for each question, in random order.
  • Writes the quizzes to 35 text files.
  • Writes the answer keys to 35 text files.

How is this done? The code will:

  • Store the states and their capitals in a dictionary
  • Call open(), write(), and close() for the quiz and answer key text files
  • Use random.shuffle() to randomize the order of the questions and multiple-choice options

Break down the problem into steps:

  • Step 1: Store the Quiz Data in a Dictionary (You will be given a file that has all of the states and capitals
  • Step 2: Create the Quiz File and Shuffle the Question Order
  • Step 3: Create the Answer Options
  • Step 4: Write Content to the Quiz and Answer Key Files

Also make sure as the teacher you tell the kids that want to cheat, "Git Rekt M8"

Example of output

List of all US states and capitals

Solution

Implementation

Summary

Files and Folders are organized in a tree like structure. The path describes the location of the file or folder. The path can either be described with an absolute path or a relative path. The os.path module has a lot of functions we can use to manipulate file paths.

We can directly read and write plain text files. We first need to use the open() function to give us a File object. With the File object, we are able to read() from the file and write() to it. the open() function can be passed a 'read', 'write', or 'append' argument.

Knowing how to manipulate files is useful because we are now able to store data after the program is finished and when the computer restarts. From this foundation, we can learn how to manipulate whole files themselves; copying, deleting, renaming, moving them, and so forth in Organizing Files

file.startswith() file.endswith()

Need list() dict.keys() dict.values() random.shuffle()