GitHub - clarecorthell/data-prototyping-talk: Data Prototyping with Python (Slides & Resources from talk)

Follow me on twitter @clarecorthell!

September 9 2014 @ Hackbright Academy, SF

Prototyping in the Data World - Data Scripting Skills

Tools

numpy multi-dimensional container of data
pandas data structures analysis tools
matplotlib python plotting library
iPython browser-based code notebook / IDE (run blocks of code, not the whole program)

Notes accompanying the talk [Video & Slides]

All python code for this talk was run in the browser-based iPython interpreter

Import tools

import numpy as np
import pandas as pd

# Render our plots inline
%matplotlib inline
import matplotlib.pyplot as plt

Get Data

turn a csv into a DataFrame (for example, an export from excel in csv form)

mattermark_df = pd.read_csv('mattermark_data.csv') => Mattermark data about funding rounds in New York City in the last five years

What's in here?

sample different parts of the data

mattermark_df[:10] sample the first ten rows of our DataFrame

mattermark_df.iloc[0] use .iloc to index into row location 0

mattermark_df['cached_uniques'] sample the column

mattermark_df['cached_uniques'].describe() show some standard statistics about that column (for numeric data)

mattermark_df.describe() show some standard statistics about all numeric columns

mattermark_df.sort('amount', ascending=False) sort entire table (descending) by amount amount of funding

What's not in here?

mattermark_df['amount'].isnull()` In the column, is the value at a given index null? (true or false)

len(np.where(mattermark_df['amount'].isnull())[0]) Count the number of null values in the column

Ask a few Questions (which lead to other questions)

What is the most common stage for funding?

mattermark_df['series'].value_counts() count the values in each category

mattermark_df['series'].value_counts().plot(kind='bar') plot in a bar graph (grouped by series) to get a quick idea of relative scale

Leads to Question: What is the typical funding amount by round?

by_series = mattermark_df.groupby('series') group records by series column (stored in a variable)

print by_series['amount'].mean().astype(int) within each grouping, calculate the mean (and do some explicit type conversion)

How many of these are mobile companies?

mobile_df = mattermark_df.dropna(subset=['cached_mobile_downloads']) we do some brash inference that if a company doesn't have a monthly count of mobile downloads, it doesn't have a mobile application; using the .dropna function, we get rid of the rows that don't have a value for that column.

mattermark_df.shape
mobile_df.shape

compare the shape of the two DataTables to see how many companies (rows) have mobile app data to see a rough proportion

For more context, see the video & slides)

Great Resources for Getting Started

The Open Source Data Science Masters - A curated curriculum of open source resources to get you working with and understanding data
pandas cookbook - great beginning resource from Julia Evans
Python for Data Analysis / Book - the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python (with numpy, pandas, and matplotlib)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Prototyping in the Data World - Data Scripting Skills

Tools

Notes accompanying the talk [Video & Slides]

Import tools

Get Data

What's in here?

What's not in here?

Ask a few Questions (which lead to other questions)

Great Resources for Getting Started

About

Uh oh!

Releases

Packages

Uh oh!

License

Uh oh!

clarecorthell/data-prototyping-talk

Folders and files

Latest commit

History

Repository files navigation

Prototyping in the Data World - Data Scripting Skills

Tools

Notes accompanying the talk [Video & Slides]

Import tools

Get Data

What's in here?

What's not in here?

Ask a few Questions (which lead to other questions)

Great Resources for Getting Started

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages