diff --git a/Dockerfile b/Dockerfile index f0be16bc0160..620177a084e5 100644 --- a/Dockerfile +++ b/Dockerfile @@ -8,10 +8,12 @@ COPY requirements.txt /label-studio RUN pip install -r requirements.txt ENV PORT="8080" -ENV collect_analytics=0 +ENV PROJECT_NAME=my_project + EXPOSE ${PORT} COPY . /label-studio -RUN pip install -e . -CMD ["label-studio", "start", "my_project", "--init", "--no-browser", "--port", "8080"] +RUN python setup.py develop + +CMD ["./tools/run.sh"] diff --git a/MANIFEST.in b/MANIFEST.in index 5d2e0ae02368..b3852007925f 100644 --- a/MANIFEST.in +++ b/MANIFEST.in @@ -3,4 +3,5 @@ recursive-include label_studio/static * include label_studio/templates/*.html include label_studio/utils/schema/*.json include label_studio/logger.json -include label_studio/config.json \ No newline at end of file +include label_studio/config.json +include label_studio/ml/default_configs/* \ No newline at end of file diff --git a/README.md b/README.md index f709960b00a9..53fc6377f51a 100644 --- a/README.md +++ b/README.md @@ -61,6 +61,20 @@ pip install lxml‑4.5.0‑cp38‑cp38‑win_amd64.whl pip install label-studio ``` +#### Install from Anaconda + +```bash +conda create --name label-studio python=3.8 +conda activate label-studio +pip install label-studio +``` + +If you see any errors during installation, try to rerun installation + +```bash +pip install --ignore-installed label-studio +``` + #### Local development Running the latest Label Studio version locally without installing package from pip could be done by: ```bash @@ -75,7 +89,7 @@ python label-studio/server.py start labeling_project --init ## Run docker You can also start serving at `http://localhost:8080` by using docker: ```bash -docker run --rm -p 8080:8080 -v `pwd`/my_project:/label-studio/my_project --name label-studio heartexlabs/label-studio:latest label-studio start my_project --init --host 0.0.0.0 +docker run --rm -p 8080:8080 -v `pwd`/my_project:/label-studio/my_project --name label-studio heartexlabs/label-studio:latest label-studio start my_project --init ``` By default, it starts blank project in `./my_project` directory. @@ -85,7 +99,7 @@ By default, it starts blank project in `./my_project` directory. You can override the default startup command by appending: ```bash -docker run -p 8080:8080 -v `pwd`/my_project:/label-studio/my_project --name label-studio heartexlabs/label-studio:latest label-studio start my_project --init --force --template image_mixedlabel --host 0.0.0.0 +docker run -p 8080:8080 -v `pwd`/my_project:/label-studio/my_project --name label-studio heartexlabs/label-studio:latest label-studio start my_project --init --force --template text_classification ``` If you want to build a local image, run: @@ -161,37 +175,17 @@ The list of supported use cases for data annotation. Please contribute your own ## Machine Learning Integration -You can easily connect your favorite machine learning framework with Label Studio by using [Heartex SDK](https://github.com/heartexlabs/pyheartex). +You can easily connect your favorite machine learning framework with Label Studio Machine Learning SDK. It's done in the simple 2 steps: +1. Start your own ML backend server ([check here for detailed instructions](label_studio/ml/README.md)), +2. Connect Label Studio to the running ML backend on [/model](http://localhost:8080/model.html) page That gives you the opportunities to use: -- **Pre-labeling**: Use model predictions for pre-labeling +- **Pre-labeling**: Use model predictions for pre-labeling (e.g. make use on-the-fly model predictions for creating rough image segmentations for further manual refinements) +- **Autolabeling**: Create automatic annotations - **Online Learning**: Simultaneously update (retrain) your model while new annotations are coming -- **Active Learning**: Perform labeling in active learning mode +- **Active Learning**: Perform labeling in active learning mode - select only most complex examples - **Prediction Service**: Instantly create running production-ready prediction service -There is a quick example tutorial on how to do that with simple image classification: - -1. Clone pyheartex, and start serving example image classifier ML backend at `http://localhost:9090` - ```bash - git clone https://github.com/heartexlabs/pyheartex.git - cd pyheartex/examples/docker - docker-compose up -d - ``` - -2. Run Label Studio project specifying ML backend URLs: - - ```bash - label-studio start imgcls --init --template image_classification \ - --ml-backend-url http://localhost:9090 --ml-backend-name my_model - ``` - -Once you're satisfied with pre-labeling results, you can immediately send prediction requests via REST API: -```bash -curl -X POST -H 'Content-Type: application/json' -d '{"image_url": "https://go.heartex.net/static/samples/sample.jpg"}' http://localhost:8080/predict -``` - -Feel free to play around any other models & frameworks apart from image classifiers! (see instructions [here](https://github.com/heartexlabs/pyheartex#advanced-usage)) - ## Label Studio for Teams, Startups, and Enterprises :office: Label Studio for Teams is our enterprise edition (cloud & on-prem), that includes a data manager, high-quality baseline models, active learning, collaborators support, and more. Please visit the [website](https://www.heartex.ai/) to learn more. @@ -205,6 +199,22 @@ Label Studio for Teams is our enterprise edition (cloud & on-prem), that include | [label-studio-converter](https://github.com/heartexlabs/label-studio-converter) | Encode labels into the format of your favorite machine learning library | | [label-studio-transformers](https://github.com/heartexlabs/label-studio-transformers) | Transformers library connected and configured for use with label studio | +## Citation + +```tex +@misc{Label Studio, + title={{Label Studio}: A Swiss Army Knife of Data Labeling and Annotation Tools}, + url={https://github.com/heartexlabs/label-studio}, + note={Open source software available from https://github.com/heartexlabs/label-studio}, + author={ + Maxim Tkachenko and + Mikhail Malyuk and + Nikita Shevchenko and + Nikolai Liubimov}, + year={2020}, +} +``` + ## License This software is licensed under the [Apache 2.0 LICENSE](/LICENSE) © [Heartex](https://www.heartex.ai/). 2020 diff --git a/app.json b/app.json index a51b7af7632b..92958696d018 100644 --- a/app.json +++ b/app.json @@ -1,7 +1,9 @@ { + "name": "Label Studio", "description": "Multi-type data labeling, annotation and exploration tool", "keywords": ["data annotation", "data labeling"], "website": "https://labelstud.io", "repository": "https://github.com/heartexlabs/label-studio", - "logo": "https://labelstud.io/images/opossum/heartex_icon_opossum_green.svg" + "logo": "https://labelstud.io/images/opossum/heartex_icon_opossum_green.svg", + "stack": "container" } diff --git a/docker-compose.yml b/docker-compose.yml index 845c4894fdb5..c0ca4bc32e01 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -7,7 +7,7 @@ services: working_dir: /label-studio volumes: - ./my_project:/label-studio/my_project - command: "label-studio start my_project ${INIT_COMMAND} " + command: "label-studio start my_project ${INIT_COMMAND} --host 0.0.0.0" ports: - "8080:8080" restart: always diff --git a/docs/source/guide/ml.md b/docs/source/guide/ml.md index 7da20765832f..b04f057074af 100644 --- a/docs/source/guide/ml.md +++ b/docs/source/guide/ml.md @@ -4,12 +4,13 @@ type: guide order: 906 --- -You can easily connect your favorite machine learning framework with Label Studio by using [Heartex SDK](https://github.com/heartexlabs/pyheartex). +You can easily connect your favorite machine learning framework with Label Studio Machine Learning SDK. That gives you the opportunities to use: -- **Pre-labeling**: Use model predictions for pre-labeling +- **Pre-labeling**: Use model predictions for pre-labeling (e.g. make use on-the-fly model predictions for creating rough image segmentations for further manual refinements) +- **Autolabeling**: Create automatic annotations - **Online Learning**: Simultaneously update (retrain) your model while new annotations are coming -- **Active Learning**: Perform labeling in active learning mode +- **Active Learning**: Perform labeling in active learning mode - select only most complex examples - **Prediction Service**: Instantly create running production-ready prediction service @@ -21,28 +22,37 @@ That gives you the opportunities to use: ## Quickstart -Here is a quick example tutorial on how to do that with simple text classification: +Here is a quick example tutorial on how to run the ML backend with a simple text classifier: 0. Clone repo ```bash git clone https://github.com/heartexlabs/label-studio ``` -1. Create new ML backend +1. Setup environment + ```bash + cd label-studio + pip install -e . + cd label_studio/ml/examples + pip install -r requirements.txt + ``` + +2. Create new ML backend ```bash label-studio-ml init my_ml_backend --script label-studio/ml/examples/simple_text_classifier.py ``` -2. Start ML backend server +3. Start ML backend server ```bash label-studio-ml start my_ml_backend ``` -3. Run Label Studio connecting it to the running ML backend: +4. Run Label Studio connecting it to the running ML backend: ```bash label-studio start text_classification_project --init --template text_sentiment --ml-backend-url http://localhost:9090 ``` + ## Create your own ML backend Check examples in `label-studio/ml/examples` directory. \ No newline at end of file diff --git a/docs/source/guide/tasks.md b/docs/source/guide/tasks.md index 3b55a444085c..1e5e68467f15 100644 --- a/docs/source/guide/tasks.md +++ b/docs/source/guide/tasks.md @@ -78,6 +78,7 @@ Here is an example of a config and tasks list composed of one element, for text "choices": ["Neutral"] } }], + # score is used for active learning sampling mode "score": 0.95 }] }] @@ -146,13 +147,15 @@ You can split your input data into several plain text files, and specify the dir ### Directory with image files ```bash -label-studio init --input-path=dir/with/images --input-format=image-dir --label-config=config.xml +label-studio init --input-path=dir/with/images --input-format=image-dir --label-config=config.xml --allow-serving-local-files ``` +> WARNING: "--allow-serving-local-files" is intended to use only for locally running instances: avoid using it for remote servers unless you are sure what you're doing. + You can point to a local directory, which is scanned recursively for image files. Each file is used to create one task. Since Label Studio works only with URLs, a web link is created for each task, pointing to your local directory as follows: ``` -http:///static/filename?d= +http:///data/filename?d= ``` Supported formats are: `.png` `.jpg` `.jpeg` `.tiff` `.bmp` `.gif` @@ -160,13 +163,15 @@ Supported formats are: `.png` `.jpg` `.jpeg` `.tiff` `.bmp` `.gif` ### Directory with audio files ```bash -label-studio init --input-path=my/audios/dir --input-format=audio-dir --label-config=config.xml +label-studio init --input-path=my/audios/dir --input-format=audio-dir --label-config=config.xml --allow-serving-local-files ``` +> WARNING: "--allow-serving-local-files" is intended to use only for locally running instances: avoid using it for remote servers unless you are sure what you're doing. + You can point to a local directory, which is scanned recursively for audio files. Each file is used to create one task. Since Label Studio works only with URLs, a web link is created for each task, pointing to your local directory as follows: ``` -http:///static/filename?d= +http:///data/filename?d= ``` Supported formats are: `.wav` `.aiff` `.mp3` `.au` `.flac` @@ -180,3 +185,23 @@ Use API to import tasks in [Label Studio basic format](tasks.html#Basic-format) curl -X POST -H Content-Type:application/json http://localhost:8080/api/import \ --data "[{\"my_key\": \"my_value_1\"}, {\"my_key\": \"my_value_2\"}]" ``` + +## Sampling + +You can define the way of how your imported tasks are exposed to annotators. Several options are available. To enable one of them, specify `--sampling=