Skip to content

Commit 0bdc507

Browse files
authored
Docs : Fixed some typos (microsoft#388)
* Docs : Fixed some typos * 01/02 requested changes made * 02/02 requested changes made
1 parent 2d64b08 commit 0bdc507

File tree

4 files changed

+13
-13
lines changed

4 files changed

+13
-13
lines changed

2-Regression/1-Tools/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ Import some libraries to help with your tasks.
110110
from sklearn import datasets, linear_model, model_selection
111111
```
112112

113-
Above you are importing `matplottlib`, `numpy` and you are importing `datasets`, `linear_model` and `model_selection` from `sklearn`. `model_selection` is used for splitting data into training and test sets.
113+
Above you are importing `matplotlib`, `numpy` and you are importing `datasets`, `linear_model` and `model_selection` from `sklearn`. `model_selection` is used for splitting data into training and test sets.
114114

115115
### The diabetes dataset
116116

5-Clustering/2-K-Means/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ The K-Means clustering process [executes in a three-step process](https://scikit
3030
1. The algorithm selects k-number of center points by sampling from the dataset. After this, it loops:
3131
1. It assigns each sample to the nearest centroid.
3232
2. It creates new centroids by taking the mean value of all of the samples assigned to the previous centroids.
33-
3. Then, it calculates the difference between the new and old centroids and repeats until the centroids are stablized.
33+
3. Then, it calculates the difference between the new and old centroids and repeats until the centroids are stabilized.
3434

3535
One drawback of using K-Means includes the fact that you will need to establish 'k', that is the number of centroids. Fortunately the 'elbow method' helps to estimate a good starting value for 'k'. You'll try it in a minute.
3636

6-NLP/4-Hotel-Reviews-1/README.md

+9-9
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
In this section you will use the techniques in the previous lessons to do some exploratory data analysis of a large dataset. Once you have a good understanding of the usefulness of the various columns, you will learn:
44

5-
- how to remove the unneeded columns
5+
- how to remove the unnecessary columns
66
- how to calculate some new data based on the existing columns
77
- how to save the resulting dataset for use in the final challenge
88

@@ -27,9 +27,9 @@ This challenge assumes that you are building a hotel recommendation bot using se
2727

2828
Using Python, a dataset of hotel reviews, and NLTK's sentiment analysis you could find out:
2929

30-
* what are the most frequently used words and phrases in reviews?
31-
* do the official *tags* describing a hotel correlate with review scores (e.g. are the more negative reviews for a particular hotel for *Family with young children* than by *Solo traveller*, perhaps indicating it is better for *Solo travellers*?)
32-
* do the NLTK sentiment scores 'agree' with the hotel reviewer's numerical score?
30+
* What are the most frequently used words and phrases in reviews?
31+
* Do the official *tags* describing a hotel correlate with review scores (e.g. are the more negative reviews for a particular hotel for *Family with young children* than by *Solo traveller*, perhaps indicating it is better for *Solo travellers*?)
32+
* Do the NLTK sentiment scores 'agree' with the hotel reviewer's numerical score?
3333

3434
#### Dataset
3535

@@ -82,17 +82,17 @@ Here they are grouped in a way that might be easier to examine:
8282
**Reviewer columns**
8383

8484
- `Total_Number_of_Reviews_Reviewer_Has_Given`
85-
- This might be an factor in a recommendation model, for instance, if you could determine that more prolific reviewers with hundreds of reviews were more likely to be negative rather than positive. However, the reviewer of any particular review is not identified with a unique code, and therefore cannot be linked to a set of reviews. There are 30 reviewers with 100 or more reviews, but it's hard to see how this can aid the recommendation model.
85+
- This might be a factor in a recommendation model, for instance, if you could determine that more prolific reviewers with hundreds of reviews were more likely to be negative rather than positive. However, the reviewer of any particular review is not identified with a unique code, and therefore cannot be linked to a set of reviews. There are 30 reviewers with 100 or more reviews, but it's hard to see how this can aid the recommendation model.
8686
- `Reviewer_Nationality`
8787
- Some people might think that certain nationalities are more likely to give a positive or negative review because of a national inclination. Be careful building such anecdotal views into your models. These are national (and sometimes racial) stereotypes, and each reviewer was an individual who wrote a review based on their experience. It may have been filtered through many lenses such as their previous hotel stays, the distance travelled, and their personal temperament. Thinking that their nationality was the reason for a review score is hard to justify.
8888

8989
##### Examples
9090

9191
| Average Score | Total Number Reviews | Reviewer Score | Negative <br />Review | Positive Review | Tags |
9292
| -------------- | ---------------------- | ---------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------- | ----------------------------------------------------------------------------------------- |
93-
| 7.8 | 1945 | 2.5 | This is currently not a hotel but a construction site I was terroized from early morning and all day with unacceptable building noise while resting after a long trip and working in the room People were working all day i e with jackhammers in the adjacent rooms I asked for a room change but no silent room was available To make thinks worse I was overcharged I checked out in the evening since I had to leave very early flight and received an appropriate bill A day later the hotel made another charge without my concent in excess of booked price It s a terrible place Don t punish yourself by booking here | Nothing Terrible place Stay away | Business trip Couple Standard Double Room Stayed 2 nights |
93+
| 7.8 | 1945 | 2.5 | This is currently not a hotel but a construction site I was terrorized from early morning and all day with unacceptable building noise while resting after a long trip and working in the room People were working all day i e with jackhammers in the adjacent rooms I asked for a room change but no silent room was available To make things worse I was overcharged I checked out in the evening since I had to leave very early flight and received an appropriate bill A day later the hotel made another charge without my consent in excess of booked price It's a terrible place Don't punish yourself by booking here | Nothing Terrible place Stay away | Business trip Couple Standard Double Room Stayed 2 nights |
9494

95-
As you can see, this guest did not have a happy stay at this hotel. The hotel has a good average score of 7.8 and 1945 reviews, but this reviewer gave it 2.5 and wrote 115 words about how negative their stay was. If they wrote nothing at all in the Positive_Review column, you might surmise there was nothing positive, but alas they wrote 7 words of warning. If we just counted words instead of the meaning, or sentiment of the words, we might have a skewed view of the reviewers intent. Strangely, their score of 2.5 is confusing, because if that hotel stay was so bad, why give it any points at all? Investigating the dataset closely, you'll see that the lowest possible score is 2.5, not 0. The highest possible score is 10.
95+
As you can see, this guest did not have a happy stay at this hotel. The hotel has a good average score of 7.8 and 1945 reviews, but this reviewer gave it 2.5 and wrote 115 words about how negative their stay was. If they wrote nothing at all in the Positive_Review column, you might surmise there was nothing positive, but alas they wrote 7 words of warning. If we just counted words instead of the meaning, or sentiment of the words, we might have a skewed view of the reviewer's intent. Strangely, their score of 2.5 is confusing, because if that hotel stay was so bad, why give it any points at all? Investigating the dataset closely, you'll see that the lowest possible score is 2.5, not 0. The highest possible score is 10.
9696

9797
##### Tags
9898

@@ -126,7 +126,7 @@ If you take the `Average_Score` columns, you might surmise it is the average of
126126

127127
To complicate things further, the hotel with the second highest number of reviews has a calculated average score of 8.12 and the dataset `Average_Score` is 8.1. Is this correct score a coincidence or is the first hotel a discrepancy?
128128

129-
On the possibility that these hotel might be an outlier, and that maybe most of the values tally up (but some do not for some reason) we will write a short programs next to explore the values in the dataset and determine the correct usage (or non-usage) of the values.
129+
On the possibility that these hotel might be an outlier, and that maybe most of the values tally up (but some do not for some reason) we will write a short program next to explore the values in the dataset and determine the correct usage (or non-usage) of the values.
130130

131131
> 🚨 A note of caution
132132
>
@@ -162,7 +162,7 @@ In this case, the data is already *clean*, that means that it is ready to work w
162162

163163
✅ You might have to work with data that required some initial processing to format it before applying NLP techniques, but not this time. If you had to, how would you handle non-English characters?
164164

165-
Take a moment to ensure you that once the data is loaded, you can explore it with code. It's very easy to want to focus on the `Negative_Review` and `Positive_Review` columns. They are filled with natural text for your NLP algorithms to process. But wait! Before you jump into the NLP and sentiment, you should follow the code below to ascertain if the values given in the dataset match the values you calculate with pandas.
165+
Take a moment to ensure that once the data is loaded, you can explore it with code. It's very easy to want to focus on the `Negative_Review` and `Positive_Review` columns. They are filled with natural text for your NLP algorithms to process. But wait! Before you jump into the NLP and sentiment, you should follow the code below to ascertain if the values given in the dataset match the values you calculate with pandas.
166166

167167
## Dataframe operations
168168

8-Reinforcement/README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
Reinforcement learning, RL, is seen as one of the basic machine learning paradigms, next to supervised learning and unsupervised learning. RL is all about decisions: delivering the right decisions or at least learning from them.
44

5-
Imagine you have a simulated environment such as the stock market. What happens if you impose a given regulation. Does it have a positive or negative effect? If something negative happens, you need to take this _negative reinforcement_, learn from it, and change course. If it's a positive outcome, you need to build on that _positive reinforcement_.
5+
Imagine you have a simulated environment such as the stock market. What happens if you impose a given regulation? Does it have a positive or negative effect? If something negative happens, you need to take this _negative reinforcement_, learn from it, and change course. If it's a positive outcome, you need to build on that _positive reinforcement_.
66

77
![peter and the wolf](images/peter.png)
88

@@ -26,7 +26,7 @@ In previous sections, you have seen two examples of machine learning problems:
2626
- **Supervised**, where we have datasets that suggest sample solutions to the problem we want to solve. [Classification](../4-Classification/README.md) and [regression](../2-Regression/README.md) are supervised learning tasks.
2727
- **Unsupervised**, in which we do not have labeled training data. The main example of unsupervised learning is [Clustering](../5-Clustering/README.md).
2828

29-
In this section, we will introduce you to a new type of learning problems that does not require labeled training data. There are several types of such problems:
29+
In this section, we will introduce you to a new type of learning problem that does not require labeled training data. There are several types of such problems:
3030

3131
- **[Semi-supervised learning](https://wikipedia.org/wiki/Semi-supervised_learning)**, where we have a lot of unlabeled data that can be used to pre-train the model.
3232
- **[Reinforcement learning](https://wikipedia.org/wiki/Reinforcement_learning)**, in which an agent learns how to behave by performing experiments in some simulated environment.

0 commit comments

Comments
 (0)