-
Notifications
You must be signed in to change notification settings - Fork 158
Description
Background
Currently, AssetOpsBench provides a robust suite of industrial scenarios (IoT, FSMR, TSFM, WO). However, to increase the benchmark's accessibility and allow for seamless evaluation across different agent architectures (like those supported by Harbor), we need an Adapter.
The Harbor Framework (https://harborframework.com/) specializes in evaluating agents in containerized environments. Adding an adapter will allow AssetOpsBench scenarios to be executed as Harbor tasks, leveraging Harbor's unified interface for benchmarking and logging.
Proposed Goals
Implement AssetOpsBenchAdapter: Create a new adapter within the AssetOpsBench ecosystem (or as a standalone contribution to Harbor's registry) that maps AssetOpsBench scenarios to Harbor's task format.
Environment Parity: Ensure the Docker-based environments used in AssetOpsBench (e.g., assetopsbench-basic) are compatible with Harbor’s BaseEnvironment.
Task Mapping: Map AssetOpsBench's task instructions, environment setups, and evaluation logic (LLM-as-a-judge or deterministic) into Harbor's instruction.md, task.toml, and test.sh format.