diff --git a/.gitignore b/.gitignore index 4f5c86b..eb7a7db 100644 --- a/.gitignore +++ b/.gitignore @@ -13,3 +13,4 @@ lectures/data/housing-kaggle *.pyc *.tsv *.wav +lectures/data/animal-faces/ diff --git a/lectures/AmirAbdi/17_natural-language-processing.ipynb b/lectures/AmirAbdi/17_natural-language-processing.ipynb index aa44fa9..cf379c2 100644 --- a/lectures/AmirAbdi/17_natural-language-processing.ipynb +++ b/lectures/AmirAbdi/17_natural-language-processing.ipynb @@ -147,7 +147,7 @@ "- Describe the reasons and benefits of using pre-trained embeddings. \n", "- Load and use pre-trained word embeddings to find word similarities and analogies. \n", "- Demonstrate biases in embeddings and learn to watch out for such biases in pre-trained embeddings.\n", - "- Use word embeddings in text classification and document clustering using `spaCy`.\n", + "- Use word **embeddings** in text classification and document clustering using `spaCy`.\n", "- Explain the general idea of topic modeling. \n", "- Describe the input and output of topic modeling. \n", "- Carry out basic text preprocessing using `spaCy`. " @@ -577,7 +577,7 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": 111, "metadata": { "slideshow": { "slide_type": "skip" @@ -610,7 +610,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 112, "metadata": {}, "outputs": [ { @@ -628,7 +628,7 @@ "True" ] }, - "execution_count": 2, + "execution_count": 112, "metadata": {}, "output_type": "execute_result" } @@ -647,7 +647,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 113, "metadata": {}, "outputs": [], "source": [ @@ -656,7 +656,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 114, "metadata": { "slideshow": { "slide_type": "slide" @@ -732,7 +732,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 115, "metadata": { "slideshow": { "slide_type": "slide" @@ -1109,7 +1109,7 @@ "[6342 rows x 6342 columns]" ] }, - "execution_count": 6, + "execution_count": 115, "metadata": {}, "output_type": "execute_result" } @@ -1129,7 +1129,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 116, "metadata": {}, "outputs": [], "source": [ @@ -1163,7 +1163,7 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 117, "metadata": {}, "outputs": [ { @@ -2031,7 +2031,7 @@ }, { "cell_type": "code", - "execution_count": 26, + "execution_count": 121, "metadata": { "slideshow": { "slide_type": "slide" @@ -2139,7 +2139,7 @@ "9 Habs 0.661023" ] }, - "execution_count": 26, + "execution_count": 121, "metadata": {}, "output_type": "execute_result" } @@ -8106,6 +8106,84 @@ "#### Visualize topics" ] }, + { + "cell_type": "code", + "execution_count": 127, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 127, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "lda.per_word_topics" + ] + }, + { + "cell_type": "code", + "execution_count": 145, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[(3319, 0.0022999633),\n", + " (3336, 0.0013174261),\n", + " (658, 0.0012206235),\n", + " (3424, 0.0012169023),\n", + " (1118, 0.0011576178),\n", + " (3788, 0.0011387375),\n", + " (1117, 0.0011356168),\n", + " (1821, 0.0008851419),\n", + " (3612, 0.00069421926),\n", + " (3499, 0.0006651815)]" + ] + }, + "execution_count": 145, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "lda.get_topic_terms(0)" + ] + }, + { + "cell_type": "code", + "execution_count": 146, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[(114, 0.008096032),\n", + " (790, 0.008090814),\n", + " (865, 0.0070692785),\n", + " (1667, 0.0067776213),\n", + " (900, 0.006763307),\n", + " (863, 0.0062548574),\n", + " (1161, 0.005847539),\n", + " (857, 0.0057668686),\n", + " (1136, 0.005758805),\n", + " (164, 0.005744651)]" + ] + }, + "execution_count": 146, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "lda.get_topic_terms(1)" + ] + }, { "cell_type": "code", "execution_count": 96, diff --git a/lectures/AmirAbdi/18_intro_to_computer-vision.ipynb b/lectures/AmirAbdi/18_intro_to_computer-vision.ipynb index 422769f..0b98879 100644 --- a/lectures/AmirAbdi/18_intro_to_computer-vision.ipynb +++ b/lectures/AmirAbdi/18_intro_to_computer-vision.ipynb @@ -47,7 +47,7 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": 59, "metadata": { "slideshow": { "slide_type": "skip" @@ -84,23 +84,6 @@ "from sklearn.preprocessing import StandardScaler" ] }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "### Learning objectives\n", - "\n", - "- Apply classifiers to **multi-class classification** algorithms.\n", - "- Explain the role of neural networks in machine learning, and the pros/cons of using them.\n", - "- Explain why the methods we've learned previously would not be effective on **image data**.\n", - "- Apply **pre-trained neural networks** to classification and regression problems.\n", - "- Utilize pre-trained networks as feature extractors and combine them with models we've learned previously." - ] - }, { "cell_type": "markdown", "metadata": { @@ -136,7 +119,12 @@ "metadata": {}, "source": [ "Answers:\n", - " -" + " - A\n", + " - B\n", + " - C\n", + " - D\n", + " \n", + " E is incorrect becasue we don't get the \"labels\" out of LDA; we just get K topics, and don't know their labels until we study them closely ourselves." ] }, { @@ -146,6 +134,23 @@ "

" ] }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "### Learning objectives\n", + "\n", + "- Apply classifiers to **multi-class classification** algorithms.\n", + "- Explain the role of neural networks in machine learning, and the pros/cons of using them.\n", + "- Explain why the methods we've learned previously would not be effective on **image data**.\n", + "- Apply **pre-trained neural networks** to classification and regression problems.\n", + "- Utilize pre-trained networks as feature extractors and combine them with models we've learned previously." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -188,7 +193,7 @@ } }, "source": [ - "### One vs. Rest \n", + "### One vs. Rest (OVR)\n", "\n", "- 1v{2,3}, 2v{1,3}, 3v{1,2}\n", "- Learn a binary model for each class which tries to separate that class from all of the other classes.\n", @@ -706,6 +711,14 @@ "- (B) For a 100-class classification problem, one-vs.-rest multi-class strategy will create 100 binary classifiers. " ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Answers:\n", + "- B" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -2111,7 +2124,7 @@ " # Zero the parameter gradients\n", " optimizer.zero_grad()\n", " \n", - " # Forward + backward + optimize\n", + " # Feed Forward + backward (back propagation) + optimize (updating weights)\n", " outputs = net(inputs)\n", " loss = criterion(outputs, labels)\n", " loss.backward()\n", @@ -2207,6 +2220,7 @@ "In the last lecture we used pre-trained embeddings to create text representations. \n", "We didn't train any models. \n", "**Q: Was that Transfer Learning?**\n", + "- Yes\n", "\n", "



" ] @@ -2499,7 +2513,7 @@ "\n", "**Q: Can this model classify images beyond the classes that it was trained on?**\n", "\n", - "A: ???" + "A: Yes, with transfer learning" ] }, { @@ -2753,7 +2767,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "- Now for each image in our dataset, we'll **extract a feature vector from a pre-trained model called densenet121**, which is trained on the ImageNet dataset. " + "- Now for each image in our dataset, we'll **extract a feature vector from a pre-trained model called densenet121**, which is trained on the **ImageNet dataset**. " ] }, {