Skip to content

Commit ec49a5b

Browse files
Fast agent + new notebooks + bug fixes
[agent] agent works much faster (bulk API methods are added) [cookbook] small fixes + two new examples: filter project image by tag + plot tags distribution [SDK/geometry] fix crop polygon [plugin/nn/YOLO] new Dockerfile: uses base-py image [guides] nn integration tutorial is improved [SDK] minor fixes
1 parent f1a25e1 commit ec49a5b

File tree

93 files changed

+1474
-480
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

93 files changed

+1474
-480
lines changed

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -8,5 +8,6 @@
88
tasks_data/*
99
!tasks_data/.gitkeep
1010

11+
1112
**/params.local.sh
1213
*/_account_private_registry.sh

README.md

-2
Original file line numberDiff line numberDiff line change
@@ -120,5 +120,3 @@ Data Transformation Language allows to automate complicated pipelines of data tr
120120
Regular updates on how to use state of the art models and solve practical
121121
data science problems with Supervisely.
122122
- [Tutorials and Cookbooks](./help) in this repository.
123-
124-

agent/README.md

+31-1
Original file line numberDiff line numberDiff line change
@@ -23,4 +23,34 @@ This principal scheme illustrates how agent processes the task.
2323

2424
[Here](https://docs.supervise.ly/cluster/add_delete_node/add_delete_node/) you will find documentaion about how to monitor Agent status.
2525

26-
![](https://i.imgur.com/rgihpsQ.png)
26+
![](https://i.imgur.com/rgihpsQ.png)
27+
28+
29+
# Environment variables:
30+
31+
#### Required:
32+
33+
- `AGENT_HOST_DIR`: directory, where agent stores user data. _(default: `$HOME/.supervisely-agent/$ACCESS_TOKEN`)_
34+
35+
- `SERVER_ADDRESS`: full server URL to connect to (e.g. `http://somehost:12345/agent`).
36+
37+
- `ACCESS_TOKEN`: unique string which allows the server to identify the agent.
38+
39+
- `DOCKER_REGISTRY`: list of used docker registry addresses. (e.g. `docker.deepsystems.io,docker.enterprise.supervise.ly`)
40+
41+
- `DOCKER_LOGIN`: list of login names for used docker registries (ordered as registries), e.g. `user,user`.
42+
43+
- `DOCKER_PASSWORD`: list of passwords for used docker registries (ordered as registries), e.g. `123,345`.
44+
45+
46+
#### Optional:
47+
48+
- `WITH_LOCAL_STORAGE`: whether to use local agent storage for long-term persistent storage of task results (learned model checkpoints, images generated by DTL) instead of uploading the results to the web instance storage. When this option is enabled, those results will be unavailable when the agent is not connected to the web instance. Do not enable this option when running the agent on transient machines, like hourly rented AWS instances, as the local data there will be lost as soon as your rented time ends. _(default: true)_
49+
50+
- `PULL_ALWAYS`: whether to always pull docker image from registry, or only if image with given name and tags not found localy. _(default: true)_
51+
52+
- `DEFAULT_TIMEOUTS`: whether to use default timeout configs or load from `/workdir/src/configs/timeouts_for_stateless.json` file. _(default: true)_
53+
54+
- `DELETE_TASK_DIR_ON_FINISH`: whether to remove task directory after the task finishes successfully. _(default: true)_
55+
56+
- `DELETE_TASK_DIR_ON_FAILURE`: whether to remove task directory after the task finishes with a failure. _(default: false)_

agent/VERSION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
agent:4.1.1
1+
agent:4.2.0

agent/plugin_info.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
{
22
"title": "Agent",
3-
"description": "brief description",
3+
"description": "Supervisely Agent - is a small, but powerful task manager that allows to connect any computer (your office PC or cloud server) to the platform and use it for any computational tasks: neural network training/inference/deployment, training data preparation and many more.",
44
"type": "agent"
55
}

agent/src/worker/agent.py

+3-2
Original file line numberDiff line numberDiff line change
@@ -60,14 +60,14 @@ def agent_connect_initially(self):
6060
shell=True, executable="/bin/bash",
6161
stdout=subprocess.PIPE).communicate()[0]
6262

63-
agent_info = {
63+
self.agent_info = {
6464
'hardware_info': hw_info,
6565
'agent_image': json.loads(docker_img_info)["Config"]["Image"],
6666
'agent_image_digest': get_self_docker_image_digest()
6767
}
6868

6969
self.api.simple_request('AgentConnected', sly.api_proto.ServerInfo,
70-
sly.api_proto.AgentInfo(info=json.dumps(agent_info)))
70+
sly.api_proto.AgentInfo(info=json.dumps(self.agent_info)))
7171

7272
def send_connect_info(self):
7373
while True:
@@ -77,6 +77,7 @@ def send_connect_info(self):
7777
def get_new_task(self):
7878
for task in self.api.get_endless_stream('GetNewTask', sly.api_proto.Task, sly.api_proto.Empty()):
7979
task_msg = json.loads(task.data)
80+
task_msg['agent_info'] = self.agent_info
8081
self.logger.debug('GET_NEW_TASK', extra={'task_msg': task_msg})
8182
self.start_task(task_msg)
8283

agent/src/worker/data_manager.py

+44-35
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,6 @@ def _split_images_by_cache(self, images):
5858

5959
def download_project(self, parent_dir, name, datasets_whitelist=None):
6060
self.logger.info("DOWNLOAD_PROJECT", extra={'title': name})
61-
#@TODO: reimplement and use path without splitting
6261
project_fs = sly.Project(os.path.join(parent_dir, name), sly.OpenMode.CREATE)
6362
project_id = self.public_api.project.get_info_by_name(self.workspace_id, name).id
6463
meta = sly.ProjectMeta.from_json(self.public_api.project.get_meta(project_id))
@@ -75,42 +74,50 @@ def download_project(self, parent_dir, name, datasets_whitelist=None):
7574

7675
def download_dataset(self, dataset, dataset_id):
7776
images = self.public_api.image.get_list(dataset_id)
78-
progress = sly.Progress('Download dataset {!r}: images'.format(dataset.name), len(images), self.logger)
77+
progress_imgs = sly.Progress('Dataset {!r}: download images'.format(dataset.name), len(images), self.logger)
78+
progress_anns = sly.Progress('Dataset {!r}: download annotations'.format(dataset.name), len(images), self.logger)
7979

8080
images_to_download = images
81+
82+
# copy images from cache to task folder and download corresponding annotations
8183
if self.has_images_storage():
8284
images_to_download, images_in_cache, images_cache_paths = self._split_images_by_cache(images)
83-
# copy images from cache to task folder
84-
for img_info, img_cache_path in zip(images_in_cache, images_cache_paths):
85-
dataset.add_item_file(img_info.name, img_cache_path)
86-
progress.iter_done_report()
85+
self.logger.info('Dataset {!r}'.format(dataset.name), extra={'total_images': len(images),
86+
'images_in_cache': len(images_in_cache),
87+
'images_to_download': len(images_to_download)})
88+
if len(images_to_download) + len(images_in_cache) != len(images):
89+
raise RuntimeError("Error with images cache during download. Please contact support.")
90+
for batch_cache in sly.batched(list(zip(images_in_cache, images_cache_paths)), constants.BATCH_SIZE_GET_IMAGES_INFO()):
91+
img_cache_ids = [img_info.id for img_info, _ in batch_cache]
92+
ann_info_list = self.public_api.annotation.download_batch(dataset_id, img_cache_ids, progress_anns.iters_done_report)
93+
img_name_to_ann = {ann.image_name: ann.annotation for ann in ann_info_list}
94+
for img_info, img_cache_path in batch_cache:
95+
dataset.add_item_file(img_info.name, img_cache_path, img_name_to_ann[img_info.name])
96+
progress_imgs.iter_done_report()
8797

8898
# download images from server
89-
img_ids = []
90-
img_paths = []
91-
for img_info in images_to_download:
92-
img_ids.append(img_info.id)
93-
# TODO download to a temp file and use dataset api to add the image to the dataset.
94-
img_paths.append(dataset.deprecated_make_img_path(img_info.name, img_info.ext))
99+
for batch_download in sly.batched(images_to_download, constants.BATCH_SIZE_GET_IMAGES_INFO()):
100+
#prepare lists for api methods
101+
img_ids = []
102+
img_paths = []
103+
for img_info in batch_download:
104+
img_ids.append(img_info.id)
105+
# TODO download to a temp file and use dataset api to add the image to the dataset.
106+
img_paths.append(dataset.deprecated_make_img_path(img_info.name, img_info.ext))
107+
108+
# download annotations
109+
ann_info_list = self.public_api.annotation.download_batch(dataset_id, img_ids, progress_anns.iters_done_report)
110+
img_name_to_ann = {ann.image_name: ann.annotation for ann in ann_info_list}
111+
self.public_api.image.download_batch(dataset_id, img_ids, img_paths, progress_imgs.iters_done_report)
112+
for img_info, img_path in zip(batch_download, img_paths):
113+
dataset.add_item_file(img_info.name, img_path, img_name_to_ann[img_info.name])
95114

96-
self.public_api.image.download_batch(img_ids, img_paths, progress.iter_done_report)
97-
for img_info, img_path in zip(images_to_download, img_paths):
98-
dataset.add_item_file(img_info.name, img_path)
115+
if self.has_images_storage():
116+
progress_cache = sly.Progress('Dataset {!r}: cache images'.format(dataset.name), len(img_paths), self.logger)
117+
img_hashes = [img_info.hash for img_info in batch_download]
118+
self.storage.images.write_objects(img_paths, img_hashes, progress_cache.iter_done_report)
99119

100-
if self.has_images_storage():
101-
progress = sly.Progress('Download dataset {!r}: cache images'.format(dataset.name), len(img_paths), self.logger)
102-
img_hashes = [img_info.hash for img_info in images_to_download]
103-
self.storage.images.write_objects(img_paths, img_hashes, progress.iter_done_report)
104-
105-
# download annotations from server
106-
img_id_to_name = {image.id: image.name for image in images}
107-
progress_ann = sly.Progress('Download dataset {!r}: annotations'.format(dataset.name), len(images), self.logger)
108-
anns = self.public_api.annotation.get_list(dataset_id, progress_cb=progress_ann.iters_done_report)
109-
for ann in anns:
110-
img_name = img_id_to_name[ann.image_id]
111-
dataset.set_ann_dict(img_name, ann.annotation)
112-
113-
#@TODO: remove legacy stuff
120+
# @TODO: remove legacy stuff
114121
# @TODO: reimplement and use path without splitting
115122
def upload_project(self, parent_dir, project_name, new_title, legacy=False, add_to_existing=False):
116123
# @TODO: reimplement and use path without splitting
@@ -152,7 +159,7 @@ def upload_dataset(self, dataset, dataset_id):
152159
hash_to_item_names[img_hash].append(item_name)
153160
if self.has_images_storage():
154161
if progress is None:
155-
progress = sly.Progress('Dataset {!r}: upload cache images'.format(dataset.name), items_count, self.logger)
162+
progress = sly.Progress('Dataset {!r}: cache images'.format(dataset.name), items_count, self.logger)
156163
self.storage.images.write_object(item_paths.img_path, img_hash)
157164
progress.iter_done_report()
158165

@@ -163,18 +170,20 @@ def add_images_annotations(hashes, pb_img_cb, pb_ann_cb):
163170
names = [name for hash in hashes for name in hash_to_item_names[hash]]
164171
unrolled_hashes = [hash for hash in hashes for _ in range(len(hash_to_item_names[hash]))]
165172
ann_paths = [path for hash in hashes for path in hash_to_ann_paths[hash]]
166-
remote_ids = self.public_api.image.add_batch(dataset_id, names, unrolled_hashes, pb_img_cb)
167-
self.public_api.annotation.add_batch(remote_ids, ann_paths, pb_ann_cb)
173+
remote_infos = self.public_api.image.add_batch(dataset_id, names, unrolled_hashes, pb_img_cb)
174+
self.public_api.annotation.upload_batch_paths(dataset_id, [info.id for info in remote_infos], ann_paths, pb_ann_cb)
168175

169176
# add already uploaded images + attach annotations
170177
remote_hashes = self.public_api.image.check_existing_hashes(list(hash_to_img_paths.keys()))
171-
add_images_annotations(remote_hashes, progress_img.iter_done_report, progress_ann.iter_done_report)
178+
if len(remote_hashes) > 0:
179+
add_images_annotations(remote_hashes, progress_img.iters_done_report, progress_ann.iters_done_report)
172180

173181
# upload new images + add annotations
174182
new_hashes = list(set(hash_to_img_paths.keys()) - set(remote_hashes))
175183
img_paths = [path for hash in new_hashes for path in hash_to_img_paths[hash]]
176-
self.public_api.image.upload_batch(img_paths, progress_img.iter_done_report)
177-
add_images_annotations(new_hashes, None, progress_ann.iter_done_report)
184+
self.public_api.image.upload_batch_paths(img_paths, progress_img.iters_done_report)
185+
if len(new_hashes) > 0:
186+
add_images_annotations(new_hashes, None, progress_ann.iters_done_report)
178187

179188
def upload_archive(self, task_id, dir_to_archive, archive_name):
180189
self.logger.info("PACK_TO_ARCHIVE ...")

agent/src/worker/fs_storages.py

+2
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,8 @@ def scan_deeper(paths):
7272
return obj_pathes_suffixes
7373

7474
def get_storage_path(self, data_hash, suffix=''):
75+
if suffix:
76+
suffix = ".{}".format(suffix).replace("..", ".")
7577
st_hash = hashlib.sha256(data_hash.encode('utf-8')).hexdigest()
7678
st_path = osp.join(self._storage_root, st_hash[0:2], st_hash[2:5], st_hash + suffix)
7779
return st_path

agent/src/worker/task_clean_node.py

+1-2
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,6 @@
88
from worker.task_sly import TaskSly
99
from worker.agent_utils import TaskDirCleaner
1010
from worker import constants
11-
from worker.utils import batched
1211

1312

1413
class TaskCleanNode(TaskSly):
@@ -40,7 +39,7 @@ def get_dataset_images_hashes(self, dataset_id):
4039
image_array = self.api.simple_request('GetDatasetImages', sly.api_proto.ImageArray, sly.api_proto.Id(id=dataset_id))
4140
img_hashes = []
4241

43-
for batch_img_ids in batched(list(image_array.images), constants.BATCH_SIZE_GET_IMAGES_INFO()):
42+
for batch_img_ids in sly.batched(list(image_array.images), constants.BATCH_SIZE_GET_IMAGES_INFO()):
4443
images_info_proto = self.api.simple_request('GetImagesInfo', sly.api_proto.ImagesInfo,
4544
sly.api_proto.ImageArray(images=batch_img_ids))
4645
img_hashes.extend([(info.hash, info.ext) for info in images_info_proto.infos])

agent/src/worker/task_dockerized.py

+6-2
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
from enum import Enum
44
from threading import Lock
55
import json
6-
from docker.errors import ImageNotFound as DockerImageNotFound
6+
from docker.errors import DockerException, ImageNotFound as DockerImageNotFound
77

88
import supervisely_lib as sly
99

@@ -112,7 +112,11 @@ def _docker_pull(self):
112112
self.logger.info('Docker image will be pulled', extra={'image_name': self.docker_image_name})
113113
progress_dummy = sly.Progress('Pulling image...', 1, ext_logger=self.logger)
114114
progress_dummy.iter_done_report()
115-
pulled_img = self._docker_api.images.pull(self.docker_image_name)
115+
try:
116+
pulled_img = self._docker_api.images.pull(self.docker_image_name)
117+
except DockerException:
118+
raise DockerException('Unable to pull image: not enough free disk space or something wrong with DockerHub.'
119+
' Please, run the task again or email support.')
116120
self.logger.info('Docker image has been pulled', extra={'pulled': {'tags': pulled_img.tags, 'id': pulled_img.id}})
117121

118122
def _docker_image_exists(self):

agent/src/worker/task_sly.py

+1
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ def init_api(self):
2121

2222
def report_start(self):
2323
self.logger.info('TASK_START', extra={'event_type': sly.EventType.TASK_STARTED})
24+
self.logger.info('TASK_MSG', extra=self.info)
2425

2526
def task_main_func(self):
2627
raise NotImplementedError()

agent/src/worker/utils.py

-6
This file was deleted.

base_images/jupyterlab/VERSION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
base-jupyterlab:4.1.0
1+
base-jupyterlab:4.2.0

base_images/py/VERSION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
base-py:4.1.0
1+
base-py:4.2.0

base_images/pytorch/VERSION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
base-pytorch:4.1.0
1+
base-pytorch:4.2.0

base_images/pytorch_v04/VERSION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
base-pytorch-v04:4.1.0
1+
base-pytorch-v04:4.2.0

base_images/tensorflow/VERSION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
base-tensorflow:4.1.0
1+
base-tensorflow:4.2.0

help/jupyterlab_scripts/VERSION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
jupyterlab:4.1.1
1+
jupyterlab:4.2.0

0 commit comments

Comments
 (0)