1
+ {
2
+ "nbformat" : 4 ,
3
+ "nbformat_minor" : 0 ,
4
+ "metadata" : {
5
+ "colab" : {
6
+ "name" : " Lemmatization.ipynb" ,
7
+ "provenance" : [],
8
+ "collapsed_sections" : [],
9
+ "authorship_tag" : " ABX9TyPqUPmmp1nbyUoCVZk5yhR2" ,
10
+ "include_colab_link" : true
11
+ },
12
+ "kernelspec" : {
13
+ "name" : " python3" ,
14
+ "display_name" : " Python 3"
15
+ },
16
+ "language_info" : {
17
+ "name" : " python"
18
+ }
19
+ },
20
+ "cells" : [
21
+ {
22
+ "cell_type" : " markdown" ,
23
+ "metadata" : {
24
+ "id" : " view-in-github" ,
25
+ "colab_type" : " text"
26
+ },
27
+ "source" : [
28
+ " <a href=\" https://colab.research.google.com/github/DataMinati/NLP-Legion/blob/main/Lemmatization.ipynb\" target=\" _parent\" ><img src=\" https://colab.research.google.com/assets/colab-badge.svg\" alt=\" Open In Colab\" /></a>"
29
+ ]
30
+ },
31
+ {
32
+ "cell_type" : " markdown" ,
33
+ "metadata" : {
34
+ "id" : " vpcGmPYqemKg"
35
+ },
36
+ "source" : [
37
+ " ### Downloading packages"
38
+ ]
39
+ },
40
+ {
41
+ "cell_type" : " code" ,
42
+ "metadata" : {
43
+ "colab" : {
44
+ "base_uri" : " https://localhost:8080/"
45
+ },
46
+ "id" : " 5pBjoCoNdjr3" ,
47
+ "outputId" : " e6904133-10f0-4e09-ce06-5a349de55e9c"
48
+ },
49
+ "source" : [
50
+ " nltk.download('punkt')\n " ,
51
+ " nltk.download('stopwords')\n " ,
52
+ " nltk.download('wordnet')"
53
+ ],
54
+ "execution_count" : null ,
55
+ "outputs" : [
56
+ {
57
+ "output_type" : " stream" ,
58
+ "text" : [
59
+ " [nltk_data] Downloading package punkt to /root/nltk_data...\n " ,
60
+ " [nltk_data] Package punkt is already up-to-date!\n " ,
61
+ " [nltk_data] Downloading package stopwords to /root/nltk_data...\n " ,
62
+ " [nltk_data] Package stopwords is already up-to-date!\n " ,
63
+ " [nltk_data] Downloading package wordnet to /root/nltk_data...\n " ,
64
+ " [nltk_data] Unzipping corpora/wordnet.zip.\n "
65
+ ],
66
+ "name" : " stdout"
67
+ },
68
+ {
69
+ "output_type" : " execute_result" ,
70
+ "data" : {
71
+ "text/plain" : [
72
+ " True"
73
+ ]
74
+ },
75
+ "metadata" : {
76
+ "tags" : []
77
+ },
78
+ "execution_count" : 7
79
+ }
80
+ ]
81
+ },
82
+ {
83
+ "cell_type" : " markdown" ,
84
+ "metadata" : {
85
+ "id" : " YN89SADeepHC"
86
+ },
87
+ "source" : [
88
+ " ### Importing necessary libraries"
89
+ ]
90
+ },
91
+ {
92
+ "cell_type" : " code" ,
93
+ "metadata" : {
94
+ "id" : " evW4SfZuddCx"
95
+ },
96
+ "source" : [
97
+ " import nltk\n " ,
98
+ " from nltk.stem import WordNetLemmatizer\n " ,
99
+ " from nltk.corpus import stopwords"
100
+ ],
101
+ "execution_count" : null ,
102
+ "outputs" : []
103
+ },
104
+ {
105
+ "cell_type" : " markdown" ,
106
+ "metadata" : {
107
+ "id" : " PquYfPhNesu2"
108
+ },
109
+ "source" : [
110
+ " ### Storing a sentence"
111
+ ]
112
+ },
113
+ {
114
+ "cell_type" : " code" ,
115
+ "metadata" : {
116
+ "id" : " pnQPYIhCdhbY"
117
+ },
118
+ "source" : [
119
+ " paragraph = \"\"\" Thank you all so very much. Thank you to the Academy. \n " ,
120
+ " Thank you to all of you in this room. I have to congratulate \n " ,
121
+ " the other incredible nominees this year. The Revenant was \n " ,
122
+ " the product of the tireless efforts of an unbelievable cast\n " ,
123
+ " and crew. First off, to my brother in this endeavor, Mr. Tom \n " ,
124
+ " Hardy. Tom, your talent on screen can only be surpassed by \n " ,
125
+ " your friendship off screen … thank you for creating a t\n " ,
126
+ " ranscendent cinematic experience. Thank you to everybody at \n " ,
127
+ " Fox and New Regency … my entire team. I have to thank \n " ,
128
+ " everyone from the very onset of my career … To my parents; \n " ,
129
+ " none of this would be possible without you. And to my \n " ,
130
+ " friends, I love you dearly; you know who you are. And lastly,\n " ,
131
+ " I just want to say this: Making The Revenant was about\n " ,
132
+ " man's relationship to the natural world. A world that we\n " ,
133
+ " collectively felt in 2015 as the hottest year in recorded\n " ,
134
+ " history. Our production needed to move to the southern\n " ,
135
+ " tip of this planet just to be able to find snow. Climate\n " ,
136
+ " change is real, it is happening right now. It is the most\n " ,
137
+ " urgent threat facing our entire species, and we need to work\n " ,
138
+ " collectively together and stop procrastinating. We need to\n " ,
139
+ " support leaders around the world who do not speak for the \n " ,
140
+ " big polluters, but who speak for all of humanity, for the\n " ,
141
+ " indigenous people of the world, for the billions and \n " ,
142
+ " billions of underprivileged people out there who would be\n " ,
143
+ " most affected by this. For our children’s children, and \n " ,
144
+ " for those people out there whose voices have been drowned\n " ,
145
+ " out by the politics of greed. I thank you all for this \n " ,
146
+ " amazing award tonight. Let us not take this planet for \n " ,
147
+ " granted. I do not take tonight for granted. Thank you so very much.\"\"\" "
148
+ ],
149
+ "execution_count" : null ,
150
+ "outputs" : []
151
+ },
152
+ {
153
+ "cell_type" : " markdown" ,
154
+ "metadata" : {
155
+ "id" : " hU-DCnKhewEx"
156
+ },
157
+ "source" : [
158
+ " ### Making an object of Lemmatizer"
159
+ ]
160
+ },
161
+ {
162
+ "cell_type" : " code" ,
163
+ "metadata" : {
164
+ "id" : " OaadDnYGdvT7"
165
+ },
166
+ "source" : [
167
+ " sentences = nltk.sent_tokenize(paragraph)\n " ,
168
+ " lemmatizer = WordNetLemmatizer()"
169
+ ],
170
+ "execution_count" : null ,
171
+ "outputs" : []
172
+ },
173
+ {
174
+ "cell_type" : " markdown" ,
175
+ "metadata" : {
176
+ "id" : " up7CHyC1e0Sd"
177
+ },
178
+ "source" : [
179
+ " ### Lemmatization of the text"
180
+ ]
181
+ },
182
+ {
183
+ "cell_type" : " code" ,
184
+ "metadata" : {
185
+ "id" : " MjI2MdPFdxff"
186
+ },
187
+ "source" : [
188
+ " # Lemmatization\n " ,
189
+ " for i in range(len(sentences)):\n " ,
190
+ " words = nltk.word_tokenize(sentences[i])\n " ,
191
+ " words = [lemmatizer.lemmatize(word) for word in words if word not in set(stopwords.words('english'))]\n " ,
192
+ " sentences[i] = ' '.join(words) "
193
+ ],
194
+ "execution_count" : null ,
195
+ "outputs" : []
196
+ },
197
+ {
198
+ "cell_type" : " markdown" ,
199
+ "metadata" : {
200
+ "id" : " 0O6nYjkVe31B"
201
+ },
202
+ "source" : [
203
+ " ### Display the final result"
204
+ ]
205
+ },
206
+ {
207
+ "cell_type" : " code" ,
208
+ "metadata" : {
209
+ "colab" : {
210
+ "base_uri" : " https://localhost:8080/"
211
+ },
212
+ "id" : " X0OuMYIydzqL" ,
213
+ "outputId" : " 2583d0ec-72d4-410f-cabf-fa287fc1bcb0"
214
+ },
215
+ "source" : [
216
+ " sentences"
217
+ ],
218
+ "execution_count" : null ,
219
+ "outputs" : [
220
+ {
221
+ "output_type" : " execute_result" ,
222
+ "data" : {
223
+ "text/plain" : [
224
+ " ['Thank much .',\n " ,
225
+ " 'Thank Academy .',\n " ,
226
+ " 'Thank room .',\n " ,
227
+ " 'I congratulate incredible nominee year .',\n " ,
228
+ " 'The Revenant product tireless effort unbelievable cast crew .',\n " ,
229
+ " 'First , brother endeavor , Mr. Tom Hardy .',\n " ,
230
+ " 'Tom , talent screen surpassed friendship screen … thank creating ranscendent cinematic experience .',\n " ,
231
+ " 'Thank everybody Fox New Regency … entire team .',\n " ,
232
+ " 'I thank everyone onset career … To parent ; none would possible without .',\n " ,
233
+ " 'And friend , I love dearly ; know .',\n " ,
234
+ " \" And lastly , I want say : Making The Revenant man 's relationship natural world .\" ,\n " ,
235
+ " 'A world collectively felt 2015 hottest year recorded history .',\n " ,
236
+ " 'Our production needed move southern tip planet able find snow .',\n " ,
237
+ " 'Climate change real , happening right .',\n " ,
238
+ " 'It urgent threat facing entire specie , need work collectively together stop procrastinating .',\n " ,
239
+ " 'We need support leader around world speak big polluter , speak humanity , indigenous people world , billion billion underprivileged people would affected .',\n " ,
240
+ " 'For child ’ child , people whose voice drowned politics greed .',\n " ,
241
+ " 'I thank amazing award tonight .',\n " ,
242
+ " 'Let u take planet granted .',\n " ,
243
+ " 'I take tonight granted .',\n " ,
244
+ " 'Thank much .']"
245
+ ]
246
+ },
247
+ "metadata" : {
248
+ "tags" : []
249
+ },
250
+ "execution_count" : 12
251
+ }
252
+ ]
253
+ },
254
+ {
255
+ "cell_type" : " code" ,
256
+ "metadata" : {
257
+ "id" : " lkpby05ieQVC"
258
+ },
259
+ "source" : [
260
+ " "
261
+ ],
262
+ "execution_count" : null ,
263
+ "outputs" : []
264
+ }
265
+ ]
266
+ }
0 commit comments