Requirements:
- Linux or Macos
- Python 3.10.10+ or 3.11
Install the latest model wrapper by tag:
pip install git+https://github.com/acceleratedscience/[email protected]
- To wrap your model(s) and use it in OpenAD toolkit, just like other OpenAD models.
- To wrap your model(s) in a quick and easy FastAPI service, without having to write all the FastAPI endpoints and return values from scratch.
- To containerize your model in Kubernetes (e.g. OpenShift), Docker, Podman, etc. OpenAD model wrapper makes your container code simpler, following the standard of all the other models in OpenAD model service.
The simplest case for wrapping a model is where the model code already handles downloading any needed model assets. A great example of this is Biomed Omics Protein Solubility https://huggingface.co/ibm-research/biomed.omics.bl.sm.ma-ted-458m.protein_solubility
To use your own private model cloud object store, set the following variables in the os host or python script to your private s3 buckets:
import os
os.environ["OPENAD_S3_HOST"] = "s3.<region>.amazonaws.com"
os.environ["OPENAD_S3_ACCESS_KEY"] = ""
os.environ["OPENAD_S3_SECRET_KEY"] = ""
os.environ["OPENAD_S3_HOST_HUB"] = "s3.<region>.amazonaws.com"
os.environ["OPENAD_S3_ACCESS_KEY_HUB"] = ""
os.environ["OPENAD_S3_SECRET_KEY_HUB"] = ""
Set ASYNC_ALLOW
to True
to configure your inference service as ansynchronous.
With this enabled, each inference job is assigned a unique, random id using
UUID. The system stores the result for 3 days before it is deleted. The
inference user can request the result using the job id at any time in the 3 days.
import os
os.environ["ASYNC_ALLOW"] = True
Example:
OpenAD:DEFAULT >> pserve generate with MySimpleGenerator data for "{'<esol>': -3.2}" sample 4 async
✔ Request Returned
{'id': '8c2cfb68-b037-11ef-9223-acde48001122'}
OpenAD:DEFAULT >> get model service 'pserve' result '8c2cfb68-b037-11ef-9223-acde48001122'
job is still running
OpenAD:DEFAULT >> get model service 'pserve' result '8c2cfb68-b037-11ef-9223-acde48001122'
pred1 pred2
------- -------
1 2
Next up, you can run: result open/edit/copy/display/as dataframe/save [as '<filename.csv>']
The following are Experimental or Advanced Settings that can be included as Environmental Variables
Clears the GPU memory for an Inference call
Default: AUTO_CLEAR_GPU_MEM: bool = True
Calls the Garbage Collector after an Inference call
Default AUTO_GARABAGE_COLLECT: bool = True
Enables Multi-Processing of synchronous Calls, Defaults to 1 Thread for safety, depends on performance sizing whether you choose to use more than 1.
Default: SERVE_MAX_WORKERS: int = -1
Enables Caching of Results for command requests, this should only be activated for Deterministic Requests, no functions that use random seeds should this be activated for.
Default: ENABLE_CACHE_RESULTS: bool = False
The Default value for Asynchronous requests is 1, this is so server capacity is managed to the minimum. It is up to the developer and Deployer of a service to set this higher than 1 based on benchmarking.
Default ASYNC_POOL_MAX: int = 1
~/.openad_models / algorithms / algorithm_type / algorithm_name / algorithm_application / algorithm_version
~/.openad_models / properties / domain / algorithm_name / algorithm_application / algorithm_version