Created using Colaboratory

MainakRepositor · MainakRepositor · commit 323a7b3ef3f6 · 2021-08-04T16:08:46.000+05:30
diff --git a/TF_IDF.ipynb b/TF_IDF.ipynb
@@ -0,0 +1,264 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "name": "TF-IDF.ipynb",
+      "provenance": [],
+      "collapsed_sections": [],
+      "authorship_tag": "ABX9TyO9FenMhAIC41DvapEM6kFf",
+      "include_colab_link": true
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/DataMinati/NLP-Legion/blob/main/TF_IDF.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "zW9eFT2NP_zs"
+      },
+      "source": [
+        "### Downloading the packages"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "M8JglsAuNYT7",
+        "outputId": "6e147693-84d1-4c4e-b20c-e84f589b7991"
+      },
+      "source": [
+        "nltk.download('punkt')\n",
+        "nltk.download('stopwords')\n",
+        "nltk.download('wordnet')"
+      ],
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "text": [
+            "[nltk_data] Downloading package punkt to /root/nltk_data...\n",
+            "[nltk_data]   Package punkt is already up-to-date!\n",
+            "[nltk_data] Downloading package stopwords to /root/nltk_data...\n",
+            "[nltk_data]   Package stopwords is already up-to-date!\n",
+            "[nltk_data] Downloading package wordnet to /root/nltk_data...\n",
+            "[nltk_data]   Package wordnet is already up-to-date!\n"
+          ],
+          "name": "stdout"
+        },
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "True"
+            ]
+          },
+          "metadata": {
+            "tags": []
+          },
+          "execution_count": 9
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "oTYLK6Z2QDkH"
+      },
+      "source": [
+        "### Importing the libraries"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "cef79IiWNyXX"
+      },
+      "source": [
+        "import nltk\n",
+        "import re\n",
+        "from nltk.corpus import stopwords\n",
+        "from nltk.stem.porter import PorterStemmer\n",
+        "from nltk.stem import WordNetLemmatizer\n",
+        "from sklearn.feature_extraction.text import TfidfVectorizer"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "MopqatR1QGPr"
+      },
+      "source": [
+        "### Storing the text"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "zItIfRnJPQJd"
+      },
+      "source": [
+        "paragraph =  \"\"\"I have three visions for India. In 3000 years of our history, people from all over \n",
+        "               the world have come and invaded us, captured our lands, conquered our minds. \n",
+        "               From Alexander onwards, the Greeks, the Turks, the Moguls, the Portuguese, the British,\n",
+        "               the French, the Dutch, all of them came and looted us, took over what was ours. \n",
+        "               Yet we have not done this to any other nation. We have not conquered anyone. \n",
+        "               We have not grabbed their land, their culture, \n",
+        "               their history and tried to enforce our way of life on them. \n",
+        "               Why? Because we respect the freedom of others.That is why my \n",
+        "               first vision is that of freedom. I believe that India got its first vision of \n",
+        "               this in 1857, when we started the War of Independence. It is this freedom that\n",
+        "               we must protect and nurture and build on. If we are not free, no one will respect us.\n",
+        "               My second vision for India’s development. For fifty years we have been a developing nation.\n",
+        "               It is time we see ourselves as a developed nation. We are among the top 5 nations of the world\n",
+        "               in terms of GDP. We have a 10 percent growth rate in most areas. Our poverty levels are falling.\n",
+        "               Our achievements are being globally recognised today. Yet we lack the self-confidence to\n",
+        "               see ourselves as a developed nation, self-reliant and self-assured. Isn’t this incorrect?\n",
+        "               I have a third vision. India must stand up to the world. Because I believe that unless India \n",
+        "               stands up to the world, no one will respect us. Only strength respects strength. We must be \n",
+        "               strong not only as a military power but also as an economic power. Both must go hand-in-hand. \n",
+        "               My good fortune was to have worked with three great minds. Dr. Vikram Sarabhai of the Dept. of \n",
+        "               space, Professor Satish Dhawan, who succeeded him and Dr. Brahm Prakash, father of nuclear material.\n",
+        "               I was lucky to have worked with all three of them closely and consider this the great opportunity of my life. \n",
+        "               I see four milestones in my career\"\"\"\n",
+        "               "
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "sG8etbOJQIyH"
+      },
+      "source": [
+        "### Cleaning the texts"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "DCU-SRAuPRjx"
+      },
+      "source": [
+        "# Cleaning the texts\n",
+        "ps = PorterStemmer()\n",
+        "wordnet=WordNetLemmatizer()\n",
+        "sentences = nltk.sent_tokenize(paragraph)\n",
+        "corpus = []\n",
+        "for i in range(len(sentences)):\n",
+        "    review = re.sub('[^a-zA-Z]', ' ', sentences[i])\n",
+        "    review = review.lower()\n",
+        "    review = review.split()\n",
+        "    review = [ps.stem(word) for word in review if not word in set(stopwords.words('english'))]\n",
+        "    review = ' '.join(review)\n",
+        "    corpus.append(review)"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "2NGHmaRxQMzx"
+      },
+      "source": [
+        "### Creating the TF-IDF model"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "cIKWmdNdN7E0"
+      },
+      "source": [
+        "# Creating the TF-IDF model\n",
+        "cv = TfidfVectorizer()\n",
+        "X = cv.fit_transform(corpus).toarray()"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "diGhJq-sQO-t"
+      },
+      "source": [
+        "### Displaying the result"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "xPEARKRHPZTQ",
+        "outputId": "5cf19576-495e-4df2-b6c7-7a2e8b83f052"
+      },
+      "source": [
+        "X"
+      ],
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "array([[0.        , 0.        , 0.        , ..., 0.        , 0.        ,\n",
+              "        0.        ],\n",
+              "       [0.        , 0.        , 0.        , ..., 0.25057734, 0.29539106,\n",
+              "        0.        ],\n",
+              "       [0.        , 0.28201784, 0.        , ..., 0.        , 0.        ,\n",
+              "        0.        ],\n",
+              "       ...,\n",
+              "       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,\n",
+              "        0.        ],\n",
+              "       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,\n",
+              "        0.        ],\n",
+              "       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,\n",
+              "        0.        ]])"
+            ]
+          },
+          "metadata": {
+            "tags": []
+          },
+          "execution_count": 15
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "GAGZWkLfPaa6"
+      },
+      "source": [
+        ""
+      ],
+      "execution_count": null,
+      "outputs": []
+    }
+  ]
+}