for dm #1

xofbd · 2017-10-20T20:28:18Z

No description provided.

dmastropole · 2017-10-20T20:35:59Z

pandas_vs_excel/pandas_vs_excel.ipynb

+   "source": [
+    "# Pandas and Excel\n",
+    "\n",
+    "Microsoft Excel is a spreadsheet software, containing data in tabular form. Entries of the data are located in cells, with numbered rows and letter labeled columns. Excel is wide spread across industries and has been around for over thirty years. It is often people's first introduction to data analysis. \n",


I think widespread should be one word

yes, thanks!

dmastropole · 2017-10-20T20:36:42Z

pandas_vs_excel/pandas_vs_excel.ipynb

+    "\n",
+    "Most users feel at home using a GUI to operate Excel and no programming is necessary for the most commonly used features. The data is presented right in front of the user and it is easy to scroll around through the spreadsheet. Making plots from the data only involves highlighting cells in the spreadsheet and clicking a few buttons.\n",
+    "\n",
+    "There are various short comings with Excel. It is closed source and not free. There are free open-source alternatives like OpenOffice and LibreOffice suites, but there might be compatibility issues between file formats, especially for complex spreadsheets. Excel becomes unstable for files reaching 500 MB, being unresponsiveness and crashing for large files, hindering productivity. Collaborations can become difficult because it is hard to inspect the spreadsheet and understand how certain values are calculated/populated. It is difficult to understand the user's thought process and work flow for the analysis.\n",


Also I think shortcomings should be one word as well.

yes, thanks!

dmastropole · 2017-10-20T20:38:41Z

pandas_vs_excel/pandas_vs_excel.ipynb

+    "\n",
+    "Most users feel at home using a GUI to operate Excel and no programming is necessary for the most commonly used features. The data is presented right in front of the user and it is easy to scroll around through the spreadsheet. Making plots from the data only involves highlighting cells in the spreadsheet and clicking a few buttons.\n",
+    "\n",
+    "There are various short comings with Excel. It is closed source and not free. There are free open-source alternatives like OpenOffice and LibreOffice suites, but there might be compatibility issues between file formats, especially for complex spreadsheets. Excel becomes unstable for files reaching 500 MB, being unresponsiveness and crashing for large files, hindering productivity. Collaborations can become difficult because it is hard to inspect the spreadsheet and understand how certain values are calculated/populated. It is difficult to understand the user's thought process and work flow for the analysis.\n",


"Excel can become unresponsive and crash for files exceeding 500 MB, hindering productivity." ???

not sure what the problem with this is? Is it confusing? I was reading that excel has trouble with large files (500 MB). The effects are that the program crashes or things are slow (unresponsive). Let me know what you meant.

dmastropole · 2017-10-20T20:39:52Z

pandas_vs_excel/pandas_vs_excel.ipynb

+    "sections = pd.read_csv('csv/sections.csv', delimiter=',')\n",
+    "\n",
+    "# print the top five entires of the DataFrame\n",
+    "exam_one.head()"


I'm wondering if it's also worth printing out sections.head() too? Up to you though.

I'll keep it as is.

dmastropole · 2017-10-20T20:44:36Z

pandas_vs_excel/pandas_vs_excel.ipynb

+   "source": [
+    "## Vlookup\n",
+    "\n",
+    "Experienced Excel users rely on Vlookup, a built-in function that searches (looks up) a specified value in one column and returns the corresponding value of another column. For our example of exam scores, we would like to take a student's second exam score and include it into the table of first exam score. The column of student names may not be in the same order, e.g., the first name in one table may not correpsond to the first name in another table.\n",


This is just me being picky (and feel free to ignore this suggestion), but I would reword this sentence as, "Let us create a table that displays students' scores for the first two exams."

your approach is clearer

dmastropole · 2017-10-20T20:53:44Z

pandas_vs_excel/pandas_vs_excel.ipynb

+   "source": [
+    "## Pivot Tables\n",
+    "\n",
+    "Pivot tables are another useful tool in Excel. It allows users to perform data aggregation; a new table is created that is a summary based on grouping of certain selected columns. Pivot tables can also be used to filter out rows from a table. In pandas, we can easily filter out rows from our `DatFrame` by using Boolean logic. For this example, we would like to determine the student's name that belong to section \"A\". This is done in pandas by first creating an array of True/False values. This array corresponds to which rows met the condition. We then use the resulting Boolean array to only call rows that meet our condition."


"It" in "It allows" should be "They" (plural for pivot tables). The sentences are also a little confusing too. Maybe consider wording: "Pivot tables are used to aggregate and filter data. We can group data by certain values in a given column and we can filter out rows using boolean logic..." ??

subject verb agreement -- I always mess that up. Conjugation in English (in my opinion) is very subtle. I'll make this change and make the text clearer.

dmastropole · 2017-10-20T20:59:39Z

pandas_vs_excel/pandas_vs_excel.ipynb

+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "For our example, we would like to calculate the mean score for each exam based on each section. There are two methods to peform pivot tables in pandas, using `pivot_table` or `group_by` method. Using the `pivot_table` method, the syntax is"


"based on each" --> "for each"?
"peform" --> "perform"
"There are two methods to peform pivot tables in pandas, using pivot_table or group_by method." --> "We can create pivot tables in pandas by using either the pivot_table or group_by method."

dmastropole · 2017-10-20T21:01:09Z

pandas_vs_excel/pandas_vs_excel.ipynb

+    "editable": true
+   },
+   "source": [
+    "In the above code, the new index was the former `Section` column and the `aggfunc` is the operation we want to perform. An alternate approach is to utilize the `groupby` method, akin to the `GROUP BY` statement in SQL."


"was" --> "is" or rather "corresponds to"

dmastropole · 2017-10-20T21:04:43Z

pandas_vs_excel/pandas_vs_excel.ipynb

+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In the above code, after we applied `groupby`, we then used the `agg` method and passed a Python dictionary. The keys of the dictionary are the columns to apply the aggregation and the values are the actual aggregation function. If wanted to apply different or more than one aggregation functions for each column, we can pass a dictionary but with the Python lists as the values for the keys."


"If wanted to apply different or more than one aggregation functions for each column, we can pass a dictionary but with the Python lists as the values for the keys." --> "want...function to...dictionary whose values consist of lists of aggregation functions"

dmastropole · 2017-10-20T21:11:10Z

pandas_vs_excel/pandas_vs_excel.ipynb

+   "source": [
+    "## Quick Introduction to pandas\n",
+    "\n",
+    "The equivalent to an Excel spreadsheet in pandas is the `DataFrame` class. It looks like a spreadsheet, with rows, columns, and indices. For this article, we will exam a case of three spreadsheets, with the first two containing information on a student's exam score for a particular exam and the final spreadsheet has information on which section the students belongs. These `DataFrames` are loaded into memory from CSV using the `read_csv` function."


"We will exam" --> "We will examine" or better yet "Let us consider three spreadsheets -- the first two containing each student's grade on an exam and the third..."

thanks for catching this.

copy directories to repo

7350684

dmastropole reviewed Oct 20, 2017

View reviewed changes

xofbd added 7 commits October 24, 2017 19:39

add suggestions from reviewer

bad8a3d

add minor text changes

41773ad

add text on groupby and apply

384e682

add concluding remarks

fede239

fix typo

7b71a52

remove old pandas_vs_PostgreSQL directory

d1c5583

add pandas_vs_PostgreSQL directory and files

08a364e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

for dm #1

for dm #1

xofbd commented Oct 20, 2017

dmastropole Oct 20, 2017

xofbd Oct 22, 2017

dmastropole Oct 20, 2017

xofbd Oct 22, 2017

dmastropole Oct 20, 2017

xofbd Oct 22, 2017

dmastropole Oct 20, 2017

xofbd Oct 22, 2017

dmastropole Oct 20, 2017

xofbd Oct 22, 2017

dmastropole Oct 20, 2017

xofbd Oct 22, 2017

dmastropole Oct 20, 2017

xofbd Oct 22, 2017

dmastropole Oct 20, 2017

xofbd Oct 22, 2017

dmastropole Oct 20, 2017

xofbd Oct 22, 2017

dmastropole Oct 20, 2017

xofbd Oct 22, 2017

for dm #1

Are you sure you want to change the base?

for dm #1

Conversation

xofbd commented Oct 20, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment