|
| 1 | +# python-data-science-intro |
| 2 | +Course materials for Intro to Data Science with Python |
| 3 | + |
| 4 | +## Getting started |
| 5 | + |
| 6 | +### Step 1: Get git |
| 7 | +The first thing you'll want to do is make sure you have git installed on your machine. Git is a program that software engineers and data scientists use to keep track of the progress they make on software projects and collaborate with teammates. In this class, we'll use git to keep track of the course materials and post solutions. |
| 8 | + |
| 9 | +#### Mac OS |
| 10 | +To check to see if you already have git installed, open a terminal window and type `which git`. If you already have git, you'll see a filepath printed to the screen. If nothing is printed, it means you don't have git installed. |
| 11 | + |
| 12 | +The easiest way to install git is with [homebrew](http://brew.sh/). Homebrew is a package manager that makes it easy and super-fast to download and install common programs that developers use. Again, check whether you already have homebrew installed by typing `which brew` in the terminal. If it's not installed, the [brew homepage](http://brew.sh/) has instructions. Once you have homebrew, you can install git by typing `brew install git`. |
| 13 | + |
| 14 | +#### Windows |
| 15 | +The nicest way to use git on Windows is with [GitHub for Windows](https://desktop.github.com/). This will install both the command line program and a nice GUI. |
| 16 | + |
| 17 | +#### Linux |
| 18 | +Most Linux distributions ship with git. If not, use your package manager of choice (apt, yum, etc). |
| 19 | + |
| 20 | +### Step 2: clone this class repository |
| 21 | +When using git, it's often the case that you'll want to grab a version of some code that's stored on a remote server somewhere. This is called *cloning* a repo, and it's what you'll do to get the course materials on to your computer. |
| 22 | + |
| 23 | +#### Command line |
| 24 | +If you're on Mac or Linux (or using the Windows command line interface), navigate to the local directory that you want the materials to live in. Then, type `git clone https://github.com/nickdavidhaynes/python-data-science-intro.git`. If you now type `ls` (or `dir` in Windows), you should see a directory called `python-data-science-intro`. This directory contains all of the course materials - congrats! |
| 25 | + |
| 26 | +### Step 3: Get Python 3 |
| 27 | +This course will use Python 3. We'll touch on the differences between Python 2 and 3, but suffice to say for now that Python 2 should only be used for legacy reasons. |
| 28 | + |
| 29 | +#### Mac OS and Linux |
| 30 | +Your machine already has Python installed. This is called *system Python* and it's generally considered bad practice to use it for development work, since making changes could cause problems with other programs that rely on it (more details [here](https://github.com/MacPython/wiki/wiki/Which-Python)). Instead, we'll install a different version of Python on top of our system Python. |
| 31 | + |
| 32 | +I *strongly* recommend downloading and installing the [Anaconda distribution](https://www.continuum.io/downloads) of Python. Anaconda is a collection of common Python tools used heavily in data science and scientific computing. More importantly, it solves a number of the common Python installation headaches that sometimes give difficulties to beginners. Just download and install the file and *boom, you're done*. |
| 33 | + |
| 34 | +#### Windows |
| 35 | +I *strongly* recommend downloading and installing the [Anaconda distribution](https://www.continuum.io/downloads) of Python. Anaconda is a collection of common Python tools used heavily in data science and scientific computing. Some of these tools rely on dependencies that are hard to get working on Windows. Anaconda solves this problem for you by packaging everything you'll need into a single .exe file. Just download and install and *boom, you're done*. |
| 36 | + |
| 37 | +### Step 4: Get the necessary Python packages |
| 38 | +Python has a very robust standard library - indeed, part of the Pythonic philosophy is to be "batteries included." Nonetheless, Python is used by so many different communities these days that there's way too much specialized software to ship with every Python download. This specialized code is therefore wrapped up in packages that users can download and use in their projects. |
| 39 | + |
| 40 | +We'll make heavy use of the common Python data science libraries (sometimes called the PyData stack) in this course, in particular, **numpy**, **scipy**, **matplotlib**, **scikit-learn**, and **Jupyter**. |
| 41 | + |
| 42 | +The good news is that if you installed Anaconda as recommended in step 2, you already have the PyData stack! If not, you *might* be able to use `pip`, the standard package manager. Note however that these data science libraries require some special dependencies that not every computer will have (and a Windows machine almost certainly doesn't have). So, seriously - just use Anaconda. |
| 43 | + |
| 44 | +### Step 5: Take a look at the pre-work |
| 45 | +All of the notes and exercises for this class will be hosted in [Jupyter](http://jupyter.org/) notebooks. Jupyter notebooks are a crucial part of data science communication and collaboration. A notebook is a file that, when run, opens an internet browser tab. From this tab, you can enter and run lines of code (often Python, but R, Julia, and many other languages have support as well), but also intersperse text and images that tell a better story about what you're doing. |
| 46 | + |
| 47 | +I strongly encourage you to take a look at the pre-work notebook before coming to the first class. This notebook will give you an interactive environment to familiarize yourself with the basics of Python and should make the first day go a bit more smoothly. |
| 48 | + |
| 49 | +To run the notebook............................. |
0 commit comments