add support to py3

Some erro when I install vizdoom, so I just modify these file
awjuliani · Mar 9, 2017 · 0685c12 · 0685c12
1 parent f711b86
commit 0685c12
Show file tree

Hide file tree

Showing 9 changed files with 865 additions and 218 deletions.
diff --git a/Contextual-Policy.ipynb b/Contextual-Policy.ipynb
@@ -2,7 +2,10 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "deletable": true,
+    "editable": true
+   },
    "source": [
     "# Simple Reinforcement Learning in Tensorflow Part 1.5: \n",
     "## The Contextual Bandits\n",
@@ -15,7 +18,9 @@
    "cell_type": "code",
    "execution_count": 1,
    "metadata": {
-    "collapsed": true
+    "collapsed": true,
+    "deletable": true,
+    "editable": true
    },
    "outputs": [],
    "source": [
@@ -26,17 +31,22 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "deletable": true,
+    "editable": true
+   },
    "source": [
     "### The Contextual Bandits\n",
     "Here we define our contextual bandits. In this example, we are using three four-armed bandit. What this means is that each bandit has four arms that can be pulled. Each bandit has different success probabilities for each arm, and as such requires different actions to obtain the best result. The pullBandit function generates a random number from a normal distribution with a mean of 0. The lower the bandit number, the more likely a positive reward will be returned. We want our agent to learn to always choose the bandit-arm that will most often give a positive reward, depending on the Bandit presented."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 2,
    "metadata": {
-    "collapsed": true
+    "collapsed": true,
+    "deletable": true,
+    "editable": true
    },
    "outputs": [],
    "source": [
@@ -66,17 +76,22 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "deletable": true,
+    "editable": true
+   },
    "source": [
     "### The Policy-Based Agent\n",
     "The code below established our simple neural agent. It takes as input the current state, and returns an action. This allows the agent to take actions which are conditioned on the state of the environment, a critical step toward being able to solve full RL problems. The agent uses a single set of weights, within which each value is an estimate of the value of the return from choosing a particular arm given a bandit. We use a policy gradient method to update the agent by moving the value for the selected action toward the recieved reward."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 3,
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "deletable": true,
+    "editable": true
    },
    "outputs": [],
    "source": [
@@ -102,49 +117,57 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "deletable": true,
+    "editable": true
+   },
    "source": [
     "### Training the Agent"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "deletable": true,
+    "editable": true
+   },
    "source": [
     "We will train our agent by getting a state from the environment, take an action, and recieve a reward. Using these three things, we can know how to properly update our network in order to more often choose actions given states that will yield the highest rewards over time."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 4,
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "deletable": true,
+    "editable": true
    },
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Mean reward for the 3 bandits: [ 0.   -0.25  0.  ]\n",
-      "Mean reward for the 3 bandits: [  9.    42.    33.75]\n",
-      "Mean reward for the 3 bandits: [ 45.5   80.    67.75]\n",
-      "Mean reward for the 3 bandits: [  86.25  116.75  101.25]\n",
-      "Mean reward for the 3 bandits: [ 122.5   153.25  139.5 ]\n",
-      "Mean reward for the 3 bandits: [ 161.75  186.25  179.25]\n",
-      "Mean reward for the 3 bandits: [ 201.    224.75  216.  ]\n",
-      "Mean reward for the 3 bandits: [ 240.25  264.    250.  ]\n",
-      "Mean reward for the 3 bandits: [ 280.25  301.75  285.25]\n",
-      "Mean reward for the 3 bandits: [ 317.75  340.25  322.25]\n",
-      "Mean reward for the 3 bandits: [ 356.5   377.5   359.25]\n",
-      "Mean reward for the 3 bandits: [ 396.25  415.25  394.75]\n",
-      "Mean reward for the 3 bandits: [ 434.75  451.5   430.5 ]\n",
-      "Mean reward for the 3 bandits: [ 476.75  490.    461.5 ]\n",
-      "Mean reward for the 3 bandits: [ 513.75  533.75  491.75]\n",
-      "Mean reward for the 3 bandits: [ 548.25  572.    527.5 ]\n",
-      "Mean reward for the 3 bandits: [ 587.5   610.75  562.  ]\n",
-      "Mean reward for the 3 bandits: [ 628.75  644.25  600.25]\n",
-      "Mean reward for the 3 bandits: [ 665.75  684.75  634.75]\n",
-      "Mean reward for the 3 bandits: [ 705.75  719.75  668.25]\n",
+      "Mean reward for each of the 3 bandits: [ 0.    0.    0.25]\n",
+      "Mean reward for each of the 3 bandits: [ 26.5   38.25  35.5 ]\n",
+      "Mean reward for each of the 3 bandits: [ 68.25  75.25  70.75]\n",
+      "Mean reward for each of the 3 bandits: [ 104.25  112.25  107.25]\n",
+      "Mean reward for each of the 3 bandits: [ 142.5   147.5   145.75]\n",
+      "Mean reward for each of the 3 bandits: [ 181.5   185.75  178.5 ]\n",
+      "Mean reward for each of the 3 bandits: [ 215.5   223.75  220.  ]\n",
+      "Mean reward for each of the 3 bandits: [ 256.5   260.75  249.5 ]\n",
+      "Mean reward for each of the 3 bandits: [ 293.5   300.25  287.5 ]\n",
+      "Mean reward for each of the 3 bandits: [ 330.25  341.    323.5 ]\n",
+      "Mean reward for each of the 3 bandits: [ 368.75  377.    359.  ]\n",
+      "Mean reward for each of the 3 bandits: [ 411.5   408.75  395.  ]\n",
+      "Mean reward for each of the 3 bandits: [ 447.    447.    429.75]\n",
+      "Mean reward for each of the 3 bandits: [ 484.    482.75  466.  ]\n",
+      "Mean reward for each of the 3 bandits: [ 522.5   520.    504.75]\n",
+      "Mean reward for each of the 3 bandits: [ 560.25  557.75  538.25]\n",
+      "Mean reward for each of the 3 bandits: [ 597.75  596.25  574.75]\n",
+      "Mean reward for each of the 3 bandits: [ 636.5   630.5   611.25]\n",
+      "Mean reward for each of the 3 bandits: [ 675.25  670.    644.5 ]\n",
+      "Mean reward for each of the 3 bandits: [ 710.5   706.5   682.75]\n",
       "The agent thinks action 4 for bandit 1 is the most promising....\n",
       "...and it was right!\n",
       "The agent thinks action 2 for bandit 2 is the most promising....\n",
@@ -189,34 +212,57 @@
     "        #Update our running tally of scores.\n",
     "        total_reward[s,action] += reward\n",
     "        if i % 500 == 0:\n",
-    "            print \"Mean reward for each of the \" + str(cBandit.num_bandits) + \" bandits: \" + str(np.mean(total_reward,axis=1))\n",
+    "            print(\"Mean reward for each of the \" + str(cBandit.num_bandits) + \" bandits: \" + str(np.mean(total_reward,axis=1)))\n",
     "        i+=1\n",
     "for a in range(cBandit.num_bandits):\n",
-    "    print \"The agent thinks action \" + str(np.argmax(ww[a])+1) + \" for bandit \" + str(a+1) + \" is the most promising....\"\n",
+    "    print(\"The agent thinks action \" + str(np.argmax(ww[a])+1) + \" for bandit \" + str(a+1) + \" is the most promising....\")\n",
     "    if np.argmax(ww[a]) == np.argmin(cBandit.bandits[a]):\n",
-    "        print \"...and it was right!\"\n",
+    "        print(\"...and it was right!\")\n",
     "    else:\n",
-    "        print \"...and it was wrong!\""
+    "        print(\"...and it was wrong!\")"
    ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true,
+    "deletable": true,
+    "editable": true
+   },
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true,
+    "deletable": true,
+    "editable": true
+   },
+   "outputs": [],
+   "source": []
   }
  ],
  "metadata": {
+  "anaconda-cloud": {},
   "kernelspec": {
-   "display_name": "Python 2",
+   "display_name": "Python 3",
    "language": "python",
-   "name": "python2"
+   "name": "python3"
   },
   "language_info": {
    "codemirror_mode": {
     "name": "ipython",
-    "version": 2
+    "version": 3
    },
    "file_extension": ".py",
    "mimetype": "text/x-python",
    "name": "python",
    "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython2",
-   "version": "2.7.11"
+   "pygments_lexer": "ipython3",
+   "version": "3.5.2"
   }
  },
  "nbformat": 4,