Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Code Agent in CodeTrans #331

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

letonghan
Copy link

@letonghan letonghan commented Mar 14, 2025

This RFC proposes the integration of two Agent mechanisms into the CodeTrans Example to enhance the reliability, user experience, and code quality.
The goal is to minimize the propagation of erroneous code and improve the feasibility of automated code translation.

@letonghan letonghan changed the title Add CodeTrans with Agents RFC [RFC] Code Agent in CodeTrans Mar 14, 2025
@joshuayao joshuayao added this to the v1.3 milestone Mar 17, 2025
@yinghu5
Copy link
Collaborator

yinghu5 commented Mar 24, 2025

[Remind] @ftian1 please help to review the RFC, thank you!

Copy link
Contributor

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO this RFC skips too many significant details:

  • If translation is to compiled language, suitable build tools need to available for all of them
    • What languages this is targeted at; just Java?
  • If resulting program relies on external components, those need to be identified & installed, before program can be successfully build or executed
    • Some dependencies can be GB sized
  • There's no mention of how the execution is sandboxed
    • E.g. allowing network connectivity for malicious code would be rather serious (it could trivially do DOS attacks etc), but also innocent programs could be intended for network access
  • Many programs do not run without input, but there's no mention of how input data provision is managed
    • Code translation may be also asked for individual functions, not whole programs

PS. Even if agent would do only code linting instead of execution, that would also need code dependencies to be installed/present (at least their API files).

@letonghan
Copy link
Author

IMHO this RFC skips too many significant details:

  • If translation is to compiled language, suitable build tools need to available for all of them

    • What languages this is targeted at; just Java?
  • If resulting program relies on external components, those need to be identified & installed, before program can be successfully build or executed

    • Some dependencies can be GB sized
  • There's no mention of how the execution is sandboxed

    • E.g. allowing network connectivity for malicious code would be rather serious (it could trivially do DOS attacks etc), but also innocent programs could be intended for network access
  • Many programs do not run without input, but there's no mention of how input data provision is managed

    • Code translation may be also asked for individual functions, not whole programs

PS. Even if agent would do only code linting instead of execution, that would also need code dependencies to be installed/present (at least their API files).

Hi @eero-t , thanks for your detailed comments! I will update this RFC later, here's some responses of your questions:

  • For build tool environment:
    • We do need to prepare different environment (considering using docker) for many languages, such as Python, Java, C#, GO and so on.
    • The dependencies will be installed in a seperate docker container for each task.
  • For sandbox security:
    • A security policy is needed for docker container to prevent from malicious attack, this part will be updated in RFC in detail.
    • The network needs to be limited only for requirements installing, for example, using a white list.
  • For program input:
    • We may ask user to provide a basic input and output case.

Please let me know if you have other suggestions.

@eero-t
Copy link
Contributor

eero-t commented Mar 25, 2025

Please let me know if you have other suggestions.

@letonghan RFC needs to answer also following questions:

  • What is done when code translation is asked for target language that agent does not support (but LLM does)?
  • Why code is not linted (which typically finds more problems than running, and is easier)?
  • What advantage building+running provides over code linting?

I'm also wondering whether each supported language would need its own additional RFC, as that that's rather complex, language specific topic (language versions, their upgrades, access to their module repositories, security etc).

@eero-t
Copy link
Contributor

eero-t commented Mar 25, 2025

Will same agents be used also for CodeGen?

@eero-t
Copy link
Contributor

eero-t commented Mar 25, 2025

Then there's also the performance aspect.

Users expect responses in seconds, but fetching code dependencies for building (or linting) the code could take minutes, maybe even tens of minutes.

Building the code also takes extra time, especially if agents need to do several rounds of builds to get translated code into fully buildable state.

Meaning that:

  • User would need some feedback of the extra, time-consuming steps being performed
  • When agent usage can induce large slowdowns, it would be nice if either:
    • UI would have an option to disable it (when perceived improvement is low enough), or
    • Application could cache query context (e.g. fetched dependencies)
      • This way it's only one-time (time/BW/CPU) cost per context, instead of user seeing large lags also for successive queries (and user feedback telling that app is doing dumbly same thing over-and-over again)
      • But it raises the need for context IDs / management, and question of how long such (large) contexts should persist?

@letonghan
Copy link
Author

  • What is done when code translation is asked for target language that agent does not support (but LLM does)?
  • Why code is not linted (which typically finds more problems than running, and is easier)?
  • What advantage building+running provides over code linting?

I'm also wondering whether each supported language would need its own additional RFC, as that that's rather complex, language specific topic (language versions, their upgrades, access to their module repositories, security etc).

I think using lint/bandit is also a great option for code checking.
This could be a two-step thing. For the firt step, agent will automatically check code with tools like lint and fix simple typos/faults. For the second step, the updated code will be executed to make sure it works.

@letonghan
Copy link
Author

  • User would need some feedback of the extra, time-consuming steps being performed

  • When agent usage can induce large slowdowns, it would be nice if either:

    • UI would have an option to disable it (when perceived improvement is low enough), or

    • Application could cache query context (e.g. fetched dependencies)

      • This way it's only one-time (time/BW/CPU) cost per context, instead of user seeing large lags also for successive queries (and user feedback telling that app is doing dumbly same thing over-and-over again)
      • But it raises the need for context IDs / management, and question of how long such (large) contexts should persist?

Yes, building code sandbox, install dependencies, and execute it would take a lot of time.
As the two-steps thought above, I prefer to make link/execution as optional, which could be enabled/disabled in the web UI.

@lkk12014402
Copy link
Collaborator

  • User would need some feedback of the extra, time-consuming steps being performed

  • When agent usage can induce large slowdowns, it would be nice if either:

    • UI would have an option to disable it (when perceived improvement is low enough), or

    • Application could cache query context (e.g. fetched dependencies)

      • This way it's only one-time (time/BW/CPU) cost per context, instead of user seeing large lags also for successive queries (and user feedback telling that app is doing dumbly same thing over-and-over again)
      • But it raises the need for context IDs / management, and question of how long such (large) contexts should persist?

Yes, building code sandbox, install dependencies, and execute it would take a lot of time. As the two-steps thought above, I prefer to make link/execution as optional, which could be enabled/disabled in the web UI.

there is code execution tool https://github.com/QwenLM/Qwen-Agent/blob/main/qwen_agent/tools/code_interpreter.py

@yinghu5 yinghu5 added the A0 need to scrub label Mar 26, 2025
@minmin-intel
Copy link

Do we have customer requests for such code translation capabilities in OPEA? Is there a compelling need to invest engineering efforts in this? Shall we think about coding agent as a whole instead of just code translation or code generation? @letonghan @ftian1 @lkk12014402

@eero-t
Copy link
Contributor

eero-t commented Mar 31, 2025

Shall we think about coding agent as a whole instead of just code translation or code generation?

Considering it for both makes more sense to me. At least I cannot quickly think of any difference between verifying / improving result for code translation, vs code generation.

@letonghan
Copy link
Author

letonghan commented Apr 7, 2025

Hi @eero-t @minmin-intel , the RFC is updated.
The lint check tool and the code execution tool could be reused in both CodeTrans and CodeGen example, but I think we don't need to combine the RFCs here, since CodeGen RFC was already merged.
We can make sure it satisfies the needs of code agent, then develop and refine it in release v1.4.
Let's make sure this RFC be merged in v1.3 before middle April, thanks!

Copy link
Contributor

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much better now, but I still have few comments.

This RFC proposes the integration of two Agent mechanisms into the CodeTrans Example to enhance the reliability, user experience, and code quality. The goal is to minimize the propagation of erroneous code and improve the feasibility of automated code translation.

- Pre-LLM Agent: Validates the correctness of the input code before it is processed by the LLM. If errors are detected, the agent attempts to automatically fix them to ensure the code is executable. If the correction is successful, the modified code proceeds to the LLM.
- Post-LLM Agent: Executes the translated code after it has been generated by the LLM. If the execution fails, the agent captures the error and sends it back to the LLM for re-generation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to info later in the doc, linting is done also in post-LLM phase (which is good thing to do).

How the CodeTrans Helps:

- Post-LLM Agent tracks retry attempts.
- If the LLM fails to produce a correct version after three attempts (or a configurable limit), the system stops further retries.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is likely to need both user-configurable limit, and agent-instance upper limit for user-selected one:

Suggested change
- If the LLM fails to produce a correct version after three attempts (or a configurable limit), the system stops further retries.
- If the LLM fails to produce a correct version after configurable number of attempts, the system stops further retries.


UI Components:

- Lint Check Button: Select to do lint check for input/output codes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linters can have a large number of plugins and rules that can be applied to the code, and often what needs to be used, are project specific. So eventually user will need to have control over those too.

But for now it would be enough for lint option be separately selectable both for input and output.


- Validates code correctness, structures input/output, executes the code, and evaluates the result.
- Runs the user-provided code to check for syntax or logical errors.
- If errors are detected, the agent attempts to automatically fix them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are lint errors supposed to fixed by linter or LLM, or both?

(While many linters can fix simple issues, they may fail with more serious ones.)


| Coding Language | Lint Tool | Introduction | Reference |
| --------------- | ------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------- |
| Python | Pylint | A tool that checks for errors in Python code, tries to enforce a coding standard and looks for bad code smells. | [link](https://www.pylint.org/) |
Copy link
Contributor

@eero-t eero-t Apr 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are few additional Python linters which usage is suggested by PyLint:
https://pylint.readthedocs.io/en/latest/index.html#advised-linters-alongside-pylint

Comment on lines +301 to +302
docker run --rm -m 512m --cpus="1" --pids-limit=128 \
--security-opt no-new-privileges -v $(pwd)/code:/code \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use only the necessary capabilities, and drop all others:
--cap-drop ALL --cap-add <what is needed>

```
2. execution: run codes using mounted dependencies
```bash
docker run --rm -v $(pwd):/code --network=none \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why running is not secured, like dep install is?

| Output Code Validation | Ensures reliable and accurate code conversion. |
| Automated Debug Feedback | Reduces trial-and-error, improving LLM accuracy. |
| Lint Static Code Check | Catch bugs early and enforce consistent code quality. |
| Secure Execution Environment | Protects the system from malicious code. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is problem added by code execution (and potentially linting / dep install), not a benefit.

I think there should be a potential problem section. For example something like below.


Risks and their mitigations / user workarounds

  • Node / cluster take over by execution of malicious code
    • Mitigation: automated vetting of the executed code + its strict sandboxing
  • Code execution exhausting node resources
    • Mitigation: strict resource usage limits
  • Application response taking too long due to dependency install / code execution
    • Mitigation: dependency caching + enforced execution timeouts + error response to user
    • Workaround: user disables linting / code execution
  • Users can affect each others' results
    • Mitigation: (dependency) caching is per-user session
  • Code execution failing translation due to limits / sandboxing / dependency being offline
    • Workaround: user disables code execution / linting

Comment on lines +326 to +329
### Phase 1: Develop Code Execution Tool, target v1.3

- Develop the Code Execution Tool in Agent to provide a secure execution environment.
- Ensure the code execution environment is securely isolated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO this is less important that the linting features, intended to cover just one of the languages, and more likely to be service security and performance issue, so it should be done later.


* To ensure that the code runs in a fully isolated environment, the tool needs to use container-based sandbox like `Docker`.
* In the context of resource constraints, the resource limits (including memory, CPU, process numbers) and security policies are needed.
* Since the dependencies need to be installed first, the network authority will be processed in two stages:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intention that the installed dependencies would be be shared between invocations? And if yes, only for the same user, or also between different users?

Note: I'm asking because Python requirements may include conflicting versions of the dependencies, especially if they're pulled by different modules.

@eero-t
Copy link
Contributor

eero-t commented Apr 7, 2025

The lint check tool and the code execution tool could be reused in both CodeTrans and CodeGen example, but I think we don't need to combine the RFCs here, since CodeGen RFC was already merged.

Ok.

(I do not see an overlap between #272 and this RFC, except both being RAG, but "before translation" phase is indeed specific just to code translation.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A0 need to scrub
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants