Predictions support in tasks.json. (#121)

* Predictions support in tasks.json. * Examples with predictions included. * NER example. * Some. * Completions back. * Fixed examples. * Copy predictions on front. * Some. * Connect your running models for prediction prelabeling, active learning and retraining (#127) * WIP: ml backend connection * add missed module * add dependencies * working ml backend * fix predictions array * add predict api * copy prediction button * add machine learning integration readme * make the copy prediction button functional convert files to new format * correct tasks with predictions * Fixes. * move project,ml_backend,ml_api to models.py, add comments * fix get_schema * Fixes. * Fixes for ml_backend is None. * Remove redundant print. * update docs * Fixes. * Fix. * docs * ls for teams * Buttons added * Some. * Slack image added. * modify logger * fix logging levels, null train jobs * add train job restore, log formatters
HumanSignal · Nov 28, 2019 · 7d26e77 · 7d26e77
1 parent ef107ec
commit 7d26e77
Show file tree

Hide file tree

Showing 43 changed files with 2,705 additions and 1,568 deletions.
diff --git a/README.md b/README.md
@@ -56,6 +56,43 @@ Check [documentation](https://labelstud.io/guide/backend.html) about backend + f
 docker run -p 8200:8200 -t -i heartexlabs/label-studio -c config.json -l ../examples/chatbot_analysis/config.xml -i ../examples/chatbot_analysis/tasks.json -o output
 ```
 
+### Machine learning integration
+
+You can easily connect your favorite machine learning framework with Label Studio by using [Heartex SDK](https://github.com/heartexlabs/pyheartex). 
+
+That gives you the opportunities to:
+- use model predictions as prelabeling
+- simultaneously update (retrain) your model while new annotations are coming
+- perform labeling in active learning mode
+- instantly create running production-ready prediction service
+
+There is a quick example tutorial how to do that with simple image classification:
+
+1. Clone pyheartex, and start serving:
+    ```bash
+    git clone https://github.com/heartexlabs/pyheartex.git
+    cd pyheartex/examples/docker
+    docker-compose up -d
+    ```
+2. Specify running server in your label config:
+    ```json
+    "ml_backend": {
+      "url": "http://localhost:9090",
+      "model_name": "my_super_model"
+    }
+    ```
+3. Launch Label Studio with [image classification config](examples/image_classification/config.xml):
+    ```bash
+    python server.py -l ../examples/image_classification/config.xml
+    ```
+
+Once you're satisfied with prelabeling results, you can imediately send prediction requests via REST API:
+```bash
+curl -X POST -H 'Content-Type: application/json' -d '{"image_url": "https://go.heartex.net/static/samples/kittens.jpg"}' http://localhost:8200/predict
+```
+
+Feel free to play around any other models & frameworks apart from image classifiers! (see instructions [here](https://github.com/heartexlabs/pyheartex#advanced-usage))
+
 ## Changelog
 
 Detailed changes for each release are documented in the [release notes](https://github.com/heartexlabs/label-studio/releases).
@@ -73,6 +110,10 @@ Please make sure to read the
 - [Contributing Guideline](/CONTRIBUTING.md)
 - [Code Of Conduct](/CODE_OF_CONDUCT.md)
 
+## Label Studio for Teams, Startups, and Enterprises
+
+Label Studio for Teams is our enterprise edition (cloud & on-prem), that includes a data manager, high-quality baseline models, active learning, collaborators support, and more. Please visit the [website](https://www.heartex.ai/) to learn more.
+
 ## License
 
 This software is licensed under the [Apache 2.0 LICENSE](/LICENSE) © [Heartex](https://www.heartex.net/).

diff --git a/backend/config.json b/backend/config.json
@@ -13,5 +13,11 @@
   "editor": {
     "build_path": "../build/static",
     "debug": false
-  }
+  },
+
+  "!ml_backend": {
+    "url": "http://localhost:9090",
+    "model_name": "my_super_model"
+  },
+  "sampling": "uniform"
 }
diff --git a/backend/logger.json b/backend/logger.json
@@ -1,10 +1,16 @@
 {
   "version": 1,
+  "formatters": {
+    "standard": {
+      "format": "[%(asctime)s] [%(name)s] [%(levelname)s] %(message)s"
+    }
+  },
   "handlers": {
     "console": {
       "class": "logging.StreamHandler",
       "level": "DEBUG",
-      "stream": "ext://sys.stdout"
+      "stream": "ext://sys.stdout",
+      "formatter": "standard"
     }
   },
   "loggers": {
@@ -17,9 +23,9 @@
     }
   },
   "root": {
-    "level": "DEBUG",
+    "level": "ERROR",
     "handlers": [
       "console"
     ]
   }
-}
+}
diff --git a/backend/requirements.txt b/backend/requirements.txt
@@ -7,3 +7,4 @@ appdirs==1.4.3
 mixpanel==4.4.0
 pandas==0.24.0
 Pillow==6.2.0
+attrs==19.1.0
diff --git a/backend/server.py b/backend/server.py
@@ -5,37 +5,56 @@
 import flask
 import json  # it MUST be included after flask!
 import utils.db as db
+import logging
 
+from copy import deepcopy
 from inspect import currentframe, getframeinfo
 from flask import request, jsonify, make_response, Response
 from utils.misc import (
     exception_treatment, log_config, log, config_line_stripped, load_config
 )
 from utils.analytics import Analytics
+from utils.models import DEFAULT_PROJECT_ID, Project, MLBackend
+
+logger = logging.getLogger(__name__)
 
 
 app = flask.Flask(__name__, static_url_path='')
 app.secret_key = 'A0Zrdqwf1AQWj12ajkhgFN]dddd/,?RfDWQQT'
 
-
 # init
 c = None
 # load editor config from XML
 label_config_line = None
 # analytics
 analytics = None
+# machine learning backend
+ml_backend = None
+# project object with lazy initialization
+project = None
 
 
 def reload_config():
     global c
     global label_config_line
     global analytics
+    global ml_backend
+    global project
     c = load_config()
     label_config_line = config_line_stripped(open(c['label_config']).read())
     if analytics is None:
         analytics = Analytics(label_config_line, c.get('collect_analytics', True))
     else:
         analytics.update_info(label_config_line, c.get('collect_analytics', True))
+    # configure project
+    if project is None:
+        project = Project(label_config=label_config_line)
+    # configure machine learning backend
+    if ml_backend is None:
+        ml_backend_params = c.get('ml_backend')
+        if ml_backend_params:
+            ml_backend = MLBackend.from_params(ml_backend_params)
+            project.connect(ml_backend)
 
 
 @app.template_filter('json')
@@ -99,7 +118,7 @@ def index():
     task_id = request.args.get('task_id', None)
 
     if task_id is not None:
-        task_data = db.get_completions(task_id)
+        task_data = db.get_task_with_completions(task_id)
         if task_data is None:
             task_data = db.get_task(task_id)
 
@@ -128,25 +147,29 @@ def tasks_page():
                                  completed_at=completed_at)
 
 
-@app.route('/api/projects/1/next/', methods=['GET'])
+@app.route(f'/api/projects/{DEFAULT_PROJECT_ID}/next/', methods=['GET'])
 @exception_treatment
 def api_generate_next_task():
     """ Generate next task to label
     """
     # try to find task is not presented in completions
     completions = db.get_completions_ids()
-    for (task_id, task) in db.get_tasks().items():
+    for task_id, task in db.iter_tasks():
         if task_id not in completions:
             log.info(msg='New task for labeling', extra=task)
             analytics.send(getframeinfo(currentframe()).function)
+            # try to use ml backend for predictions
+            if ml_backend:
+                task = deepcopy(task)
+                task['predictions'] = ml_backend.make_predictions(task, project)
             return make_response(jsonify(task), 200)
 
     # no tasks found
     analytics.send(getframeinfo(currentframe()).function, error=404)
     return make_response('', 404)
 
 
-@app.route('/api/projects/1/task_ids/', methods=['GET'])
+@app.route(f'/api/projects/{DEFAULT_PROJECT_ID}/task_ids/', methods=['GET'])
 @exception_treatment
 def api_all_task_ids():
     """ Get all tasks ids
@@ -162,13 +185,13 @@ def api_tasks(task_id):
     """ Get task by id
     """
     # try to get task with completions first
-    task_data = db.get_completions(task_id)
+    task_data = db.get_task_with_completions(task_id)
     task_data = db.get_task(task_id) if task_data is None else task_data
     analytics.send(getframeinfo(currentframe()).function)
     return make_response(jsonify(task_data), 200)
 
 
-@app.route('/api/projects/1/completions_ids/', methods=['GET'])
+@app.route(f'/api/projects/{DEFAULT_PROJECT_ID}/completions_ids/', methods=['GET'])
 @exception_treatment
 def api_all_completion_ids():
     """ Get all completion ids
@@ -190,6 +213,9 @@ def api_completions(task_id):
         completion.pop('state', None)  # remove editor state
         completion_id = db.save_completion(task_id, completion)
         log.info(msg='Completion saved', extra={'task_id': task_id, 'output': request.json})
+        # try to train model with new completions
+        if ml_backend:
+            ml_backend.update_model(db.get_task(task_id), completion, project)
         analytics.send(getframeinfo(currentframe()).function)
         return make_response(json.dumps({'id': completion_id}), 201)
 
@@ -236,15 +262,34 @@ def api_completion_update(task_id, completion_id):
     return make_response('ok', 201)
 
 
-@app.route('/api/projects/1/expert_instruction')
+@app.route(f'/api/projects/{DEFAULT_PROJECT_ID}/expert_instruction')
 @exception_treatment
 def api_instruction():
+    """ Instruction for annotators
+    """
     analytics.send(getframeinfo(currentframe()).function)
     return make_response(c['instruction'], 200)
 
 
+@app.route('/predict', methods=['POST'])
+@exception_treatment
+def api_predict():
+    """ Make ML prediction using ml_backend
+    """
+    task = request.json
+    if project.ml_backend:
+        predictions = project.ml_backend.make_predictions({'data': task}, project)
+        analytics.send(getframeinfo(currentframe()).function)
+        return make_response(jsonify(predictions), 200)
+    else:
+        analytics.send(getframeinfo(currentframe()).function, error=400)
+        return make_response(jsonify("No ML backend"), 400)
+
+
 @app.route('/data/<path:filename>')
-def get_image_file(filename):
+def get_data_file(filename):
+    """ External resource serving
+    """
     directory = request.args.get('d')
     return flask.send_from_directory(directory, filename, as_attachment=True)
 

diff --git a/backend/static/images/slack.png b/backend/static/images/slack.png
diff --git a/backend/templates/header.html b/backend/templates/header.html
@@ -16,5 +16,17 @@
         <span class="delim">|</span>
 
         <a href="https://github.com/heartexlabs/label-studio" target="_blank"><img src="/static/images/github.svg" height="22"/></a>
+
+        <a href="https://docs.google.com/forms/d/e/1FAIpQLSdLHZx5EeT1J350JPwnY2xLanfmvplJi6VZk65C2R4XSsRBHg/viewform?usp=sf_link"
+           target="_blank"><img src="/static/images/slack.png" height="22"/></a>
+
+        <div class="fb-like"
+             style="top: -8px !important;"
+             data-href="https://www.facebook.com/heartexnet/"
+             data-width="" data-layout="button" data-action="like"
+             data-size="small" data-show-faces="false" data-share="false"
+             data-colorscheme="dark"></div>
+        <script async defer crossorigin="anonymous" src="https://connect.facebook.net/en_US/sdk.js#xfbml=1&version=v5.0&appId=1384721251840630&autoLogAppEvents=1"></script>
+
     </ul>
 </div>
diff --git a/backend/templates/index.html b/backend/templates/index.html
@@ -54,12 +54,8 @@
                     "completions:menu", // right menu with completion items
                     "side-column" // entity
                 ],
-                task: {
-                    id: {{ task_data["id"] }},
-                    data: {{ task_data["data"] | json | safe }},
-                    completions: {{ task_data["completions"] | json | safe }},
-                    // predictions: [],  // the same as completions but will be displayed in predictions section
-                }
+                task: {{ task_data | json | safe }}
+
             });
         {% else %}
             // Label stream mode
@@ -70,11 +66,12 @@
                 project: { id: 1 },
                 interfaces: [
                     "basic",
-                    "load",  // load next task
+                    "load",  // load next task automatically (label stream mode)
                     "panel",  // undo, redo, reset panel
                     "controls",  // all control buttons: skip, submit, update
                     "submit",  // submit button on controls
                     "predictions", // show predictions from task.predictions = [{...}, {...}]
+                    "predictions:menu", // right menu with prediction items
                     "completions",  // show completions
                     "side-column" // entity
                 ]