Skip to content

Commit 54eff12

Browse files
Created using Colaboratory
1 parent 2e1daa7 commit 54eff12

File tree

1 file changed

+266
-0
lines changed

1 file changed

+266
-0
lines changed

Lemmatization.ipynb

Lines changed: 266 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,266 @@
1+
{
2+
"nbformat": 4,
3+
"nbformat_minor": 0,
4+
"metadata": {
5+
"colab": {
6+
"name": "Lemmatization.ipynb",
7+
"provenance": [],
8+
"collapsed_sections": [],
9+
"authorship_tag": "ABX9TyPqUPmmp1nbyUoCVZk5yhR2",
10+
"include_colab_link": true
11+
},
12+
"kernelspec": {
13+
"name": "python3",
14+
"display_name": "Python 3"
15+
},
16+
"language_info": {
17+
"name": "python"
18+
}
19+
},
20+
"cells": [
21+
{
22+
"cell_type": "markdown",
23+
"metadata": {
24+
"id": "view-in-github",
25+
"colab_type": "text"
26+
},
27+
"source": [
28+
"<a href=\"https://colab.research.google.com/github/DataMinati/NLP-Legion/blob/main/Lemmatization.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
29+
]
30+
},
31+
{
32+
"cell_type": "markdown",
33+
"metadata": {
34+
"id": "vpcGmPYqemKg"
35+
},
36+
"source": [
37+
"### Downloading packages"
38+
]
39+
},
40+
{
41+
"cell_type": "code",
42+
"metadata": {
43+
"colab": {
44+
"base_uri": "https://localhost:8080/"
45+
},
46+
"id": "5pBjoCoNdjr3",
47+
"outputId": "e6904133-10f0-4e09-ce06-5a349de55e9c"
48+
},
49+
"source": [
50+
"nltk.download('punkt')\n",
51+
"nltk.download('stopwords')\n",
52+
"nltk.download('wordnet')"
53+
],
54+
"execution_count": null,
55+
"outputs": [
56+
{
57+
"output_type": "stream",
58+
"text": [
59+
"[nltk_data] Downloading package punkt to /root/nltk_data...\n",
60+
"[nltk_data] Package punkt is already up-to-date!\n",
61+
"[nltk_data] Downloading package stopwords to /root/nltk_data...\n",
62+
"[nltk_data] Package stopwords is already up-to-date!\n",
63+
"[nltk_data] Downloading package wordnet to /root/nltk_data...\n",
64+
"[nltk_data] Unzipping corpora/wordnet.zip.\n"
65+
],
66+
"name": "stdout"
67+
},
68+
{
69+
"output_type": "execute_result",
70+
"data": {
71+
"text/plain": [
72+
"True"
73+
]
74+
},
75+
"metadata": {
76+
"tags": []
77+
},
78+
"execution_count": 7
79+
}
80+
]
81+
},
82+
{
83+
"cell_type": "markdown",
84+
"metadata": {
85+
"id": "YN89SADeepHC"
86+
},
87+
"source": [
88+
"### Importing necessary libraries"
89+
]
90+
},
91+
{
92+
"cell_type": "code",
93+
"metadata": {
94+
"id": "evW4SfZuddCx"
95+
},
96+
"source": [
97+
"import nltk\n",
98+
"from nltk.stem import WordNetLemmatizer\n",
99+
"from nltk.corpus import stopwords"
100+
],
101+
"execution_count": null,
102+
"outputs": []
103+
},
104+
{
105+
"cell_type": "markdown",
106+
"metadata": {
107+
"id": "PquYfPhNesu2"
108+
},
109+
"source": [
110+
"### Storing a sentence"
111+
]
112+
},
113+
{
114+
"cell_type": "code",
115+
"metadata": {
116+
"id": "pnQPYIhCdhbY"
117+
},
118+
"source": [
119+
"paragraph = \"\"\"Thank you all so very much. Thank you to the Academy. \n",
120+
" Thank you to all of you in this room. I have to congratulate \n",
121+
" the other incredible nominees this year. The Revenant was \n",
122+
" the product of the tireless efforts of an unbelievable cast\n",
123+
" and crew. First off, to my brother in this endeavor, Mr. Tom \n",
124+
" Hardy. Tom, your talent on screen can only be surpassed by \n",
125+
" your friendship off screen … thank you for creating a t\n",
126+
" ranscendent cinematic experience. Thank you to everybody at \n",
127+
" Fox and New Regency … my entire team. I have to thank \n",
128+
" everyone from the very onset of my career … To my parents; \n",
129+
" none of this would be possible without you. And to my \n",
130+
" friends, I love you dearly; you know who you are. And lastly,\n",
131+
" I just want to say this: Making The Revenant was about\n",
132+
" man's relationship to the natural world. A world that we\n",
133+
" collectively felt in 2015 as the hottest year in recorded\n",
134+
" history. Our production needed to move to the southern\n",
135+
" tip of this planet just to be able to find snow. Climate\n",
136+
" change is real, it is happening right now. It is the most\n",
137+
" urgent threat facing our entire species, and we need to work\n",
138+
" collectively together and stop procrastinating. We need to\n",
139+
" support leaders around the world who do not speak for the \n",
140+
" big polluters, but who speak for all of humanity, for the\n",
141+
" indigenous people of the world, for the billions and \n",
142+
" billions of underprivileged people out there who would be\n",
143+
" most affected by this. For our children’s children, and \n",
144+
" for those people out there whose voices have been drowned\n",
145+
" out by the politics of greed. I thank you all for this \n",
146+
" amazing award tonight. Let us not take this planet for \n",
147+
" granted. I do not take tonight for granted. Thank you so very much.\"\"\""
148+
],
149+
"execution_count": null,
150+
"outputs": []
151+
},
152+
{
153+
"cell_type": "markdown",
154+
"metadata": {
155+
"id": "hU-DCnKhewEx"
156+
},
157+
"source": [
158+
"### Making an object of Lemmatizer"
159+
]
160+
},
161+
{
162+
"cell_type": "code",
163+
"metadata": {
164+
"id": "OaadDnYGdvT7"
165+
},
166+
"source": [
167+
"sentences = nltk.sent_tokenize(paragraph)\n",
168+
"lemmatizer = WordNetLemmatizer()"
169+
],
170+
"execution_count": null,
171+
"outputs": []
172+
},
173+
{
174+
"cell_type": "markdown",
175+
"metadata": {
176+
"id": "up7CHyC1e0Sd"
177+
},
178+
"source": [
179+
"### Lemmatization of the text"
180+
]
181+
},
182+
{
183+
"cell_type": "code",
184+
"metadata": {
185+
"id": "MjI2MdPFdxff"
186+
},
187+
"source": [
188+
"# Lemmatization\n",
189+
"for i in range(len(sentences)):\n",
190+
" words = nltk.word_tokenize(sentences[i])\n",
191+
" words = [lemmatizer.lemmatize(word) for word in words if word not in set(stopwords.words('english'))]\n",
192+
" sentences[i] = ' '.join(words) "
193+
],
194+
"execution_count": null,
195+
"outputs": []
196+
},
197+
{
198+
"cell_type": "markdown",
199+
"metadata": {
200+
"id": "0O6nYjkVe31B"
201+
},
202+
"source": [
203+
"### Display the final result"
204+
]
205+
},
206+
{
207+
"cell_type": "code",
208+
"metadata": {
209+
"colab": {
210+
"base_uri": "https://localhost:8080/"
211+
},
212+
"id": "X0OuMYIydzqL",
213+
"outputId": "2583d0ec-72d4-410f-cabf-fa287fc1bcb0"
214+
},
215+
"source": [
216+
"sentences"
217+
],
218+
"execution_count": null,
219+
"outputs": [
220+
{
221+
"output_type": "execute_result",
222+
"data": {
223+
"text/plain": [
224+
"['Thank much .',\n",
225+
" 'Thank Academy .',\n",
226+
" 'Thank room .',\n",
227+
" 'I congratulate incredible nominee year .',\n",
228+
" 'The Revenant product tireless effort unbelievable cast crew .',\n",
229+
" 'First , brother endeavor , Mr. Tom Hardy .',\n",
230+
" 'Tom , talent screen surpassed friendship screen … thank creating ranscendent cinematic experience .',\n",
231+
" 'Thank everybody Fox New Regency … entire team .',\n",
232+
" 'I thank everyone onset career … To parent ; none would possible without .',\n",
233+
" 'And friend , I love dearly ; know .',\n",
234+
" \"And lastly , I want say : Making The Revenant man 's relationship natural world .\",\n",
235+
" 'A world collectively felt 2015 hottest year recorded history .',\n",
236+
" 'Our production needed move southern tip planet able find snow .',\n",
237+
" 'Climate change real , happening right .',\n",
238+
" 'It urgent threat facing entire specie , need work collectively together stop procrastinating .',\n",
239+
" 'We need support leader around world speak big polluter , speak humanity , indigenous people world , billion billion underprivileged people would affected .',\n",
240+
" 'For child ’ child , people whose voice drowned politics greed .',\n",
241+
" 'I thank amazing award tonight .',\n",
242+
" 'Let u take planet granted .',\n",
243+
" 'I take tonight granted .',\n",
244+
" 'Thank much .']"
245+
]
246+
},
247+
"metadata": {
248+
"tags": []
249+
},
250+
"execution_count": 12
251+
}
252+
]
253+
},
254+
{
255+
"cell_type": "code",
256+
"metadata": {
257+
"id": "lkpby05ieQVC"
258+
},
259+
"source": [
260+
""
261+
],
262+
"execution_count": null,
263+
"outputs": []
264+
}
265+
]
266+
}

0 commit comments

Comments
 (0)