Add an RFC For Job Execution Plugins to Enable Online Custom Scorers by mprahl · Pull Request #2 · mlflow/rfcs

mprahl · 2026-03-20T17:30:15Z

This is the core design. The follow up for remote scorers supporting custom scoring securely is in #3.

mprahl · 2026-03-20T17:34:40Z

@B-Step62 @etirelli @TomeHirata could you please review this?

This depends on mlflow#2 and adds safe online scoring for custom scorers. Signed-off-by: mprahl <mprahl@users.noreply.github.com>

Co-authored-by: Humair Khan <HumairAK@users.noreply.github.com> Signed-off-by: mprahl <mprahl@users.noreply.github.com>

This depends on mlflow#2 and adds safe online scoring for custom scorers. Signed-off-by: mprahl <mprahl@users.noreply.github.com>

TomeHirata · 2026-03-23T05:42:10Z

rfcs/0002-job-executor-plugins/0002-job-executor-plugins.md

+    def recover_jobs(self, unfinished_job_ids: list[str]) -> list[JobRecoveryResult]: ...
+
+    @property
+    def scorer_capabilities(self) -> ScorerCapability: ...  # defaults to NONE and participates in backend routing


A job is a higher abstraction than a scorer execution job, and it's a bit odd that a job executor has a property for what scorer type is supported. If the intention is to tell if UDF is supported by the backend or not, can we have a boolean flag like is_udf_supported, or more generally capabilities property that returns ["UDF"]. Also, I wonder if we need this property from the beginning. Any job executor should be able to execute any Python function, and it just has a different resource isolation level. For local development, users are free to use SubprocessJobExecutor, and for the remote tracking server, they can just switch to DockerJobExecutor/K8sJobExecutor, and this property is not used.

Good point. We don't want to restrict to just "scorer" jobs, so we could make this a generic capabilities property.

The reason why I had this was mostly to be able to automatically block custom scorer code if the job executor did not have isolation capabilities. On second thought, we can just let the admin opt in to custom scorers explicitly with an environment variable and/or mlflow server CLI flag.

I'll make that change but let me know if you have a different idea.

On second thought, we can just let the admin opt in to custom scorers explicitly with an environment variable and/or mlflow server CLI flag.

Good point, I prefer to remove capabilities from the job executor interface altogether and support routing or executor<>job type mapping later if explicitly requested.

TomeHirata · 2026-03-23T05:56:50Z

rfcs/0002-job-executor-plugins/0002-job-executor-plugins.md

+    tracking_uri: str
+    gateway_uri: str | None = None  # optional MLflow AI Gateway base URI reachable from the job runtime


These two uris should be identical

I thought you could start the AI Gateway separately with mlflow gateway start so this was to account for if the tracking server and gateway are deployed on separate servers. If that's not common, I can remove it.

Let me know your preference!

mlflow gateway start is a legacy gateway product that we don't promote anymore. The new gateway feature exposes the gateway endpoints on the tracking server directly, so I'd recommend we just ask users to set tracking_uri only.

TomeHirata · 2026-03-23T06:12:00Z

rfcs/0002-job-executor-plugins/0002-job-executor-plugins.md

+- `remote_execution` answers whether the job runs through the local direct-store path or through the remote executor
+  contract
+
+This distinction matters for `optimize_prompts_job`. It is not arbitrary custom Python in the same way that a custom


I'd refactor optimize_prompts_job rather than defining the job executor interface based on how optimize_prompts_job works. Btw, this is also an issue for online scorers that don't use MLflow gateway.

@TomeHirata, my original thought was to not cause a breaking change for non-gateway users. I also don't think you can do online scoring without the MLflow AI Gateway today, but you can do one-off evaluations through the UI using direct. I may be wrong about that.

So maybe, if the plugin has remote_execution() return True, we can disallow non-gateway usage. Then for existing users, it's not a breaking change, because they would still use the default subprocess executor backend which does support local.

I also don't think you can do online scoring without the MLflow AI Gateway today

This is true for the UI path, but I believe users can register judges via the Python API.

So maybe, if the plugin has remote_execution() return True, we can disallow non-gateway usage

Yeah, I can try adding this validation if that's not difficult. Otherwise, we can document the limitation and raise an error at runtime if the direct provider API is used and the secret is missing.

TomeHirata · 2026-03-23T06:19:13Z

rfcs/0002-job-executor-plugins/0002-job-executor-plugins.md

+
+- **Job row claim**: the worker's conditional `PENDING -> RUNNING` transition that gives one MLflow instance ownership
+  of a queued job row
+- **Exclusivity lock**: the higher-level lock stored in `job_locks`, typically for a key such as an experiment ID


Do we have any concrete use cases for this higher level locking (e.g., experiment id)?

This is to support the exclusive argument in the job decorator for run_online_trace_scorer_job and run_online_session_scorer_job. They don't allow running multiple per experiment ID. By bring this to a database level lock, we can replicate the same locking that exists in Huey today, except it would now support multiple MLflow replicas.

Got it. @dbczumar, what was the main motivation to bring the resource-based exclusion to the online scorer?

Concurrent job executions could result in duplicate logs (e.g. MLflow assessments) for jobs that process traces from a particular experiment, etc, such as the online scoring job.

TomeHirata · 2026-03-23T06:31:30Z

rfcs/0002-job-executor-plugins/0002-job-executor-plugins.md

+- **Job lease**: the short-lived `RUNNING`-job lease tracked by `lease_expires_at`, used to detect stale monitored work
+- **Scheduler lease**: the single-leader discovery lease stored in `scheduler_leases`
+
+`JobLockManager` replaces Huey's lock helper and keeps the existing lock key computation model. Lock acquisition is an


How do we plan to implement the job queue based on the job table and the job_locks table in a multi-replica setting? Implementing a high-performing multi-process queue is non-trivial, and that's part of why huey was selected instead of a database-based job queue implementation.

I wondering of possibly offloading this capability to a third party. However, Huey's current support for distributed task based queuing seems limited to mostly Redis, which adds another major dependency. We would like to provide users the option to be able to just leverage their current existing mlflow DB to reduce deployment overhead, but Huey doesn't seem to have proper support for sql based DBs.

We will evaluate the feasibility of leveraging this or something similar for task based queing and locking and follow up.

So we looked into other alternatives. We were unable to find a strong existing alternative that cleanly fits all of our requirements: no additional infrastructure beyond the existing MLflow DB, multi-replica execution, SQL-backed durability/coordination, portability across PostgreSQL/MySQL/MSSQL, and preserving a built-in OSS/local experience.

The alternatives we discussed each seem to miss at least one of those constraints. Huey has SQL-backed storage, but not the broader distributed coordination model we need here. Celery would introduce an external broker. Other alternatives are specific to DB's like Postgresql.

So given the current constraints, I think the proposal is the right direction. I do agree this means we are taking on distributed queue / locking / recovery correctness as MLflow product scope.

@TomeHirata, what specific implementation detail would be most helpful to spell out next so we can move forward with implementation?

I'll add:

Huey supports SQLite but I didn't see support for other database types.

Celery does support a SQLAlchemy plugin but it would not solve the exclusive lock mechanism already in MLflow jobs on the experiment.

It seems Huey supports other types of SQL-backed durability (https://github.com/coleifer/huey/blob/master/huey/contrib/sql_huey.py), wasn't this enough?

I added a section for a hybrid approach that leverages Huey.

TomeHirata · 2026-03-23T06:42:22Z

rfcs/0002-job-executor-plugins/0002-job-executor-plugins.md

+`JobExecutionContext.workspace`, while executors themselves remain workspace-unaware.
+
+Multi-replica coordination assumes a transactional tracking database such as PostgreSQL, MySQL, or MSSQL. SQLite is
+acceptable for single-process local use, but it is not a safe foundation for multi-replica lease and lock coordination.


nit: I overall agree with the statement, but note that by default, mlflow server spins up multiple uvicorn workers.

TomeHirata · 2026-03-23T07:04:09Z

rfcs/0002-job-executor-plugins/0002-job-executor-plugins.md

+
+Each job token is granted only the permissions needed for the job that owns it:
+
+- `EDIT` on the target experiment


I wonder if we create an attack vector that allows attackers to access a resource they are not permitted to through job execution. Also, the current design requires us to list required permissions for each job type, but identifying all required permissions for complex jobs like prompt optimization is not trivial.
So I wonder if we should just carry the caller's permission. Concretely,

When a job is submitted, we authenticate the user and generate a short-term token (job token)

The user ID and the job token are included in the HTTP header when the job executor calls the tracking server

The tracking server verifies the token and authorize the request based on the user's permission

Thanks for calling this out. I agree that identifying the required permissions is not trivial with the current shape of the job, especially for optimize_prompts_job, because some of that dependency resolution still happens inside the job at execution time.

That said, it seems like we can make this workable with a refactor rather than by carrying the caller's full permissions at runtime. In particular, if we move more of the dependency/resource resolution for optimize_prompts_job to the server side at submission time instead of inside the job body, we should be able to determine the required resources up front without causing breaking changes for existing users. For remote job executors, we can also require gateway-backed model usage so the set of required permissions stays explicit and bounded.

I still think we should keep the remote execution path least-privileged. We also should add a check to ensure that the user creating an online scorer already holds the permissions required for the job to run, which should help prevent privilege escalation.

There is still some residual risk here because MLflow permissions are not scoped at the run level today. So a token scoped to an experiment could theoretically read or modify other runs in that experiment that it does not strictly need. That is not ideal, but I think it is an acceptable and explicit limitation for now, and still safer than giving the job the caller's full live permissions.

I'll work on a section in the doc to proposes refactoring optimize prompt jobs.

Thanks. Yeah, it's ideal to verify all required permissions at the submission time based on the job's business logic and the caller's permission. I just called out that bringing the user's permission is probably the most efficient way to avoid breaching the caller's permission. If we think we can check the required permission for all job types at the submission time, let's document the decision and what types of refactoring are necessary.

TomeHirata · 2026-03-23T07:05:14Z

rfcs/0002-job-executor-plugins/0002-job-executor-plugins.md

+This is one of the core security benefits of the remote model. The remote backend gets a scoped token, not broad
+provider credentials.
+
+`optimize_prompts_job` is excluded from this path by design. It still participates in the common framework, but remains


ditto, we should be able to use gateway:/... in optimize_prompts_job too.

TomeHirata · 2026-03-23T07:08:06Z

rfcs/0002-job-executor-plugins/0002-job-executor-plugins.md

+
+## Drawbacks
+
+1. This proposal moves more logic into the core MLflow job framework. Huey previously hid some of that complexity.


This is a bit concerning. cc: @WeichenXu123 who made the decision for huey

TomeHirata · 2026-03-23T07:11:44Z

rfcs/0002-job-executor-plugins/0002-job-executor-plugins.md

+
+# Open questions
+
+1. Should `python_env` remain part of the `@job` decorator contract? It is currently unused in practice, and keeping it


Iirc, python_env is for installing extra packages required for the job. Don't we still need this if we want to allow users to use extra packages in the remote executor?

@TomeHirata the main reason of the open question is to simplify things by not continuing to allow specifying the Python version. We'd still want the extra packages though.

Got it. How do we handle an edge case where the Python version of the job executor cannot install the required packages for the job?

TomeHirata · 2026-03-23T07:12:17Z

rfcs/0002-job-executor-plugins/0002-job-executor-plugins.md

+
+    def start_executor(self) -> None: ...
+
+    def stop_executor(self) -> None: ...


q: when is stop_executor called?

The intent was on server shutdown to allow and daemons/processes shutdown gracefully. I'll add a note in the doc. The main motivation is so that each plugin implementation doesn't have to keep track of the server process state to determine when to clean up.

TomeHirata · 2026-03-23T07:12:42Z

rfcs/0002-job-executor-plugins/0002-job-executor-plugins.md

+    @abstractmethod
+    def cancel_job(self, job_id: str) -> None: ...
+
+    def recover_jobs(self, unfinished_job_ids: list[str]) -> list[JobRecoveryResult]: ...


Shouldn't recover_jobs also be an abstract method?

Good point, I can't see a reason not to make it one. Thanks!

TomeHirata · 2026-03-23T07:17:57Z

rfcs/0002-job-executor-plugins/0002-job-executor-plugins.md

+    gateway_uri: str | None = None  # optional MLflow AI Gateway base URI reachable from the job runtime
+    token: str | None = None  # used by remote executors
+    workspace: str | None = None
+    pip_config: PipConfig | None = None  # pip install settings for local or remote runtimes


Can we reuse the existing _PythonEnv data model?

Is your preference to expand _PythonEnv and reuse it here since it doesn't have the configuration for the PyPi index like the proposed PipConfig does?

These fields in JobExecutionContext are immutable, why not just treat them as part of job params ?

if JobExecutionContext is used by JobExecutor, can we pass JobExecutionContext to JobExecutor's constructor instead ?

@WeichenXu123 good questions. My intent with JobExecutionContext is to keep framework-owned runtime metadata separate from the job's logical params.

Even if fields like tracking_uri, workspace, and the remote token are immutable for a given run, they are not part of the job function's business input. Some are deployment-derived and some are framework-generated at execution time, so putting them into params would blur the boundary between user/job inputs and runtime/executor metadata.

For the same reason, I don't think they belong on the executor constructor either, since the executor instance is deployment-scoped while this context is per job run. I think submit_job(..., context=...) is still the right shape.

That said, I'll go ahead and update the proposal to extend _PythonEnv rather than introduce a separate PipConfig type.

Signed-off-by: mprahl <mprahl@users.noreply.github.com>

mprahl · 2026-03-23T21:21:30Z

@TomeHirata thanks for the review! I addressed your comments or replied to them. Could you please take another look?

WeichenXu123 · 2026-03-24T10:32:18Z

rfcs/0002-job-executor-plugins/0002-job-executor-plugins.md

+        fn_fullname: str,
+        params: dict[str, Any],
+        context: JobExecutionContext,
+        python_env: Any | None = None,


we already can configure python_env for certain job function, do we need to support configuring python_env for individual job run ?

This is the python_env on the job decorator passed down to the backend executor plugin.

WeichenXu123 · 2026-03-24T10:37:45Z

rfcs/0002-job-executor-plugins/0002-job-executor-plugins.md

+4. Land the Remote Executors RFC after the core abstractions are approved, or
+   review it in parallel if it helps make the core contract clearer.


feel free to file the follow-up Remote Executors RFC . I want to review together and see if anything in current RFC needs to improve

@WeichenXu123 thanks for offering to review! It's at #3.

This depends on mlflow#2 and adds safe online scoring for custom scorers. Signed-off-by: mprahl <mprahl@users.noreply.github.com>

Signed-off-by: mprahl <mprahl@users.noreply.github.com>

mprahl force-pushed the online-scoring-plugin branch from 8a013d6 to 511c662 Compare March 20, 2026 18:07

mprahl added a commit to mprahl/mlflow-rfcs that referenced this pull request Mar 20, 2026

Add an RFC For Docker and Kubernetes Job Execution Plugins

8d5a48c

This depends on mlflow#2 and adds safe online scoring for custom scorers. Signed-off-by: mprahl <mprahl@users.noreply.github.com>

Add an RFC For Job Execution Plugins to Enable Online Custom Scorers

3524afe

Co-authored-by: Humair Khan <HumairAK@users.noreply.github.com> Signed-off-by: mprahl <mprahl@users.noreply.github.com>

mprahl force-pushed the online-scoring-plugin branch from 511c662 to 3524afe Compare March 20, 2026 18:11

mprahl added a commit to mprahl/mlflow-rfcs that referenced this pull request Mar 20, 2026

Add an RFC For Docker and Kubernetes Job Execution Plugins

d845ea1

This depends on mlflow#2 and adds safe online scoring for custom scorers. Signed-off-by: mprahl <mprahl@users.noreply.github.com>

mprahl mentioned this pull request Mar 20, 2026

Add an RFC For Docker and Kubernetes Job Execution Plugins #3

Draft

TomeHirata reviewed Mar 23, 2026

View reviewed changes

Address feedback and improve clarity

b5fced0

Signed-off-by: mprahl <mprahl@users.noreply.github.com>

mprahl requested a review from TomeHirata March 23, 2026 21:21

WeichenXu123 reviewed Mar 24, 2026

View reviewed changes

mprahl added a commit to mprahl/mlflow-rfcs that referenced this pull request Mar 24, 2026

Add an RFC For Docker and Kubernetes Job Execution Plugins

24459cc

This depends on mlflow#2 and adds safe online scoring for custom scorers. Signed-off-by: mprahl <mprahl@users.noreply.github.com>

mprahl force-pushed the online-scoring-plugin branch from 6bc13da to c749254 Compare March 24, 2026 18:37

mprahl requested review from WeichenXu123 and dbczumar March 24, 2026 19:20

Address feedback and add hybrid option with Huey

88059cf

Signed-off-by: mprahl <mprahl@users.noreply.github.com>

mprahl force-pushed the online-scoring-plugin branch from c749254 to 88059cf Compare March 25, 2026 17:36

		tracking_uri: str
		gateway_uri: str \| None = None # optional MLflow AI Gateway base URI reachable from the job runtime


		Each job token is granted only the permissions needed for the job that owns it:

		- `EDIT` on the target experiment


		## Drawbacks

		1. This proposal moves more logic into the core MLflow job framework. Huey previously hid some of that complexity.


		# Open questions

		1. Should `python_env` remain part of the `@job` decorator contract? It is currently unused in practice, and keeping it


		def start_executor(self) -> None: ...

		def stop_executor(self) -> None: ...

		4. Land the Remote Executors RFC after the core abstractions are approved, or
		review it in parallel if it helps make the core contract clearer.

Conversation

mprahl commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mprahl commented Mar 20, 2026

Uh oh!

TomeHirata Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomeHirata Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomeHirata Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HumairAK Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomeHirata Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomeHirata Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mprahl commented Mar 20, 2026 •

edited

Loading

TomeHirata Mar 23, 2026 •

edited

Loading

TomeHirata Mar 23, 2026 •

edited

Loading

TomeHirata Mar 23, 2026 •

edited

Loading

HumairAK Mar 23, 2026 •

edited

Loading

TomeHirata Mar 24, 2026 •

edited

Loading

TomeHirata Mar 23, 2026 •

edited

Loading

TomeHirata Mar 23, 2026 •

edited

Loading