-
Notifications
You must be signed in to change notification settings - Fork 43
for dm #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
"source": [ | ||
"# Pandas and Excel\n", | ||
"\n", | ||
"Microsoft Excel is a spreadsheet software, containing data in tabular form. Entries of the data are located in cells, with numbered rows and letter labeled columns. Excel is wide spread across industries and has been around for over thirty years. It is often people's first introduction to data analysis. \n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think widespread should be one word
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, thanks!
"\n", | ||
"Most users feel at home using a GUI to operate Excel and no programming is necessary for the most commonly used features. The data is presented right in front of the user and it is easy to scroll around through the spreadsheet. Making plots from the data only involves highlighting cells in the spreadsheet and clicking a few buttons.\n", | ||
"\n", | ||
"There are various short comings with Excel. It is closed source and not free. There are free open-source alternatives like OpenOffice and LibreOffice suites, but there might be compatibility issues between file formats, especially for complex spreadsheets. Excel becomes unstable for files reaching 500 MB, being unresponsiveness and crashing for large files, hindering productivity. Collaborations can become difficult because it is hard to inspect the spreadsheet and understand how certain values are calculated/populated. It is difficult to understand the user's thought process and work flow for the analysis.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I think shortcomings should be one word as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, thanks!
"\n", | ||
"Most users feel at home using a GUI to operate Excel and no programming is necessary for the most commonly used features. The data is presented right in front of the user and it is easy to scroll around through the spreadsheet. Making plots from the data only involves highlighting cells in the spreadsheet and clicking a few buttons.\n", | ||
"\n", | ||
"There are various short comings with Excel. It is closed source and not free. There are free open-source alternatives like OpenOffice and LibreOffice suites, but there might be compatibility issues between file formats, especially for complex spreadsheets. Excel becomes unstable for files reaching 500 MB, being unresponsiveness and crashing for large files, hindering productivity. Collaborations can become difficult because it is hard to inspect the spreadsheet and understand how certain values are calculated/populated. It is difficult to understand the user's thought process and work flow for the analysis.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Excel can become unresponsive and crash for files exceeding 500 MB, hindering productivity." ???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what the problem with this is? Is it confusing? I was reading that excel has trouble with large files (500 MB). The effects are that the program crashes or things are slow (unresponsive). Let me know what you meant.
"sections = pd.read_csv('csv/sections.csv', delimiter=',')\n", | ||
"\n", | ||
"# print the top five entires of the DataFrame\n", | ||
"exam_one.head()" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if it's also worth printing out sections.head() too? Up to you though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll keep it as is.
"source": [ | ||
"## Vlookup\n", | ||
"\n", | ||
"Experienced Excel users rely on Vlookup, a built-in function that searches (looks up) a specified value in one column and returns the corresponding value of another column. For our example of exam scores, we would like to take a student's second exam score and include it into the table of first exam score. The column of student names may not be in the same order, e.g., the first name in one table may not correpsond to the first name in another table.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just me being picky (and feel free to ignore this suggestion), but I would reword this sentence as, "Let us create a table that displays students' scores for the first two exams."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your approach is clearer
"source": [ | ||
"## Pivot Tables\n", | ||
"\n", | ||
"Pivot tables are another useful tool in Excel. It allows users to perform data aggregation; a new table is created that is a summary based on grouping of certain selected columns. Pivot tables can also be used to filter out rows from a table. In pandas, we can easily filter out rows from our `DatFrame` by using Boolean logic. For this example, we would like to determine the student's name that belong to section \"A\". This is done in pandas by first creating an array of True/False values. This array corresponds to which rows met the condition. We then use the resulting Boolean array to only call rows that meet our condition." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"It" in "It allows" should be "They" (plural for pivot tables). The sentences are also a little confusing too. Maybe consider wording: "Pivot tables are used to aggregate and filter data. We can group data by certain values in a given column and we can filter out rows using boolean logic..." ??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subject verb agreement -- I always mess that up. Conjugation in English (in my opinion) is very subtle. I'll make this change and make the text clearer.
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"For our example, we would like to calculate the mean score for each exam based on each section. There are two methods to peform pivot tables in pandas, using `pivot_table` or `group_by` method. Using the `pivot_table` method, the syntax is" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"based on each" --> "for each"?
"peform" --> "perform"
"There are two methods to peform pivot tables in pandas, using pivot_table
or group_by
method." --> "We can create pivot tables in pandas by using either the pivot_table
or group_by
method."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed
"editable": true | ||
}, | ||
"source": [ | ||
"In the above code, the new index was the former `Section` column and the `aggfunc` is the operation we want to perform. An alternate approach is to utilize the `groupby` method, akin to the `GROUP BY` statement in SQL." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"was" --> "is" or rather "corresponds to"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"In the above code, after we applied `groupby`, we then used the `agg` method and passed a Python dictionary. The keys of the dictionary are the columns to apply the aggregation and the values are the actual aggregation function. If wanted to apply different or more than one aggregation functions for each column, we can pass a dictionary but with the Python lists as the values for the keys." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"If wanted to apply different or more than one aggregation functions for each column, we can pass a dictionary but with the Python lists as the values for the keys." --> "want...function to...dictionary whose values consist of lists of aggregation functions"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed
"source": [ | ||
"## Quick Introduction to pandas\n", | ||
"\n", | ||
"The equivalent to an Excel spreadsheet in pandas is the `DataFrame` class. It looks like a spreadsheet, with rows, columns, and indices. For this article, we will exam a case of three spreadsheets, with the first two containing information on a student's exam score for a particular exam and the final spreadsheet has information on which section the students belongs. These `DataFrames` are loaded into memory from CSV using the `read_csv` function." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"We will exam" --> "We will examine" or better yet "Let us consider three spreadsheets -- the first two containing each student's grade on an exam and the third..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for catching this.
No description provided.