Skip to content

Conversation

@hsteude
Copy link
Contributor

@hsteude hsteude commented Jul 31, 2025

The readme file that live at pielines/pipe-fiction/README.md will hopefully explain why this might be useful and how to test it.

@hsteude hsteude requested review from Copilot and geier and removed request for Copilot July 31, 2025 13:46
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a comprehensive KFP (Kubeflow Pipelines) development and debugging demo called "pipe-fiction" that addresses challenges in developing and debugging ML pipelines. The demo provides multiple execution environments (subprocess, Docker, and cluster) with remote debugging capabilities.

Key changes:

  • Demonstrates code organization patterns for KFP development with separation between core Python packages and pipeline orchestration
  • Implements remote debugging infrastructure using debugpy for IDE integration across all execution environments
  • Provides monkey patches for older KFP versions to enable port mapping and environment variable support in DockerRunner

Reviewed Changes

Copilot reviewed 20 out of 24 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
kfp_docker_monkey_patches.py Monkey patches for KFP DockerRunner to enable port mapping and environment variables in older versions
auth_session.py Utility for obtaining Istio/Dex authentication sessions for Kubeflow cluster access
submit_to_cluster_from_remote.py Script for submitting pipelines to remote Kubeflow clusters with authentication
submit_to_cluster_from_kf_notebook.py Simple pipeline submission script for use within Kubeflow notebooks
run_locally_in_subproc.py Local pipeline execution using subprocess runner
run_locally_in_docker.py Local pipeline execution using Docker runner with debugging support
pipeline.py Main pipeline definition with debugging configuration
components.py KFP component definitions with remote debugging capabilities
Various config files Project configuration, dependencies, and VS Code debugging setup
Comments suppressed due to low confidence (1)

pipelines/pipe-fiction/pipelines/pyproject.toml:13

  • pip version 25.1.1 does not exist. As of my knowledge cutoff in January 2025, the latest pip version was in the 24.x series. This version specification will cause dependency resolution to fail.
    "pip>=25.1.1",

@@ -0,0 +1,207 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[codz]
Copy link

Copilot AI Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file extension pattern has a typo. It should be '.py[cod]' instead of '.py[codz]' to match Python compiled bytecode files (.pyc, .pyo, .pyd).

Suggested change
*.py[codz]
*.py[cod]

Copilot uses AI. Check for mistakes.

Usage (exactly like upstream KFP 2.14+):
import kfp_docker_patches # Apply patches
from kfp import local
Copy link

Copilot AI Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage example in the docstring shows importing 'kfp_docker_patches' before 'from kfp import local', but the actual import should be 'from utils import kfp_docker_monkey_patches' or similar, and the patches are already applied when the module is imported due to line 176.

Copilot uses AI. Check for mistakes.
@tmvfb tmvfb self-requested a review October 6, 2025 17:53

Submit to the cluster:
```bash
python run_in_k8s_cluster.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script is not a part of the repo anymore

├── components.py # KFP component definitions (import from base image)
├── pipeline.py # Pipeline assembly
├── run_locally_*.py # Local execution scripts
├── run_in_k8s_cluster.py # Remote execution
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script is not a part of the repo anymore

submit_to_cluster_from_kf_notebook.py
and
submit_to_cluster_from_remote.py

are not mentioned

4. Build and push Docker image when ready for submission to the cluster (this could also be done in a CI/CD pipeline):
`docker build -t <your-registry>/<your-image-name>:<your-tag> . && docker push`
5. Update image reference in pipeline components if needed
6. Submit pipeline to cluster: `python submit_to_cluster.py`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this script does not exist anymore

also

submit_to_cluster_from_kf_notebook.py
# and
submit_to_cluster_from_remote.py

will require some documentation on how to set env vars

url=os.environ["KUBEFLOW_ENDPOINT"],
username=os.environ["KUBEFLOW_USERNAME"],
password=os.environ["KUBEFLOW_PASSWORD"],
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are not going to work with our keycloak setup, didn't work for me on BM cluster with the following error:

Traceback (most recent call last):
  File "/Users/igorkvachenok/kubeflow-examples/pipelines/pipe-fiction/pipelines/submit_to_cluster_from_remote.py", line 11, in <module>
    auth_session = get_istio_auth_session(
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/igorkvachenok/kubeflow-examples/pipelines/pipe-fiction/pipelines/utils/auth_session.py", line 76, in get_istio_auth_session
    raise RuntimeError(
RuntimeError: HTTP status code '404' for GET against: https://genai2.bm.justadd.ai/auth/realms/prokube/protocol/openid-connect/auth/local?client_id=dex-oidc-client&redirect_uri=https%3A
%2F%2Fgenai2.bm.justadd.ai%2Fdex%2Fcallback&response_type=code&scope=openid+openid+email+profile+groups+offline_access&state=bjub3qy5xt2ov5q5jngwjxeg3

AFAIK, to run this with keycloak, we would need to create a custom client specifically for this.

5. Rebuild the image if needed and push it to your registry:
`docker push <your-registry>/<your-image-name>:<your-tag>`
6. Update image reference in pipeline components if needed
7. Submit pipeline to cluster: `python submit_to_cluster.py`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this script does not exist anymore


```bash
cd pipelines
python submit_to_cluster.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this script does not exist anymore

For pipeline-only changes:
1. Modify files in `pipelines/` directory
2. Enable remote debugging for the task you want to debug (see remote debugging section for details)
3. Submit directly to cluster: `python submit_to_cluster.py`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this script does not exist anymore


Cluster:
```bash
python run_in_k8s_cluster.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this script does not exist anymore

- Slowest feedback - submission and scheduling overhead
- Complex setup - requires cluster access and networking

## Remote Debugging
Copy link
Collaborator

@tmvfb tmvfb Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly is meant by "remote"? I thought this section is going to be about running KFP in-cluster and running the debugger locally, but looks like this is not the case

I also found the section structure a bit confusing:

Debuggable Component Decorator (Recommended)

Manual Component Setup (Alternative)

VS Code Setup

Debugging Workflow

Cluster Debugging with Port Forwarding

I think the first 2 could be a part of one subsection. Probably the best way to make this clear is to describe in advance, which section serves which purpose, because the logic is (as I understood it):

  1. Use Debugging Workflow section for local debugging
  2. Use Cluster Debugging with Port Forwarding for in-cluster debugging
  3. Use Debuggable Component Decorator and Manual Component Setup for recipies on how to enable debugging for components
  4. Use VS Code Setup to set up debugging in VSCode (is this whole section dedicated to VSCode-only, or we could run it anywhere?)


### VS Code Setup

Create `.vscode/launch.json`:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have this file in the repo, and it differs a bit from what's specified in the readme (doesn't have "Pipeline: Remote KFP/DockerRunner" section). Should we update it?

@tmvfb
Copy link
Collaborator

tmvfb commented Oct 7, 2025

@hsteude, in general awesome job, I had no idea about KFP debugging and I also never used VSCode, so I had to download and take a look, worked for me, and was well-written and understandable.

My main feedback is as follows:

  1. We need to update outdated parts of README (see comments)
  2. I found the remote debugging section a bit confusingly structured (although the information there looks correct)
  3. I added a commit where I use an env var for the image that I build (otherwise it was hardcoded, which was not mentioned anywhere when I first encountered the code in the README - feel free to remove the commit if it doesn't make sense)
  4. I think the remote instructions won't work with our keycloak setup. We need to either find a way to make those work, or drop them from the repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants