Skip to content

[RFC] Code Agent in CodeTrans #331

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Apr 10, 2025
Merged

Conversation

letonghan
Copy link
Contributor

@letonghan letonghan commented Mar 14, 2025

This RFC proposes the integration of two Agent mechanisms into the CodeTrans Example to enhance the reliability, user experience, and code quality.
The goal is to minimize the propagation of erroneous code and improve the feasibility of automated code translation.

@letonghan letonghan changed the title Add CodeTrans with Agents RFC [RFC] Code Agent in CodeTrans Mar 14, 2025
@joshuayao joshuayao added this to the v1.3 milestone Mar 17, 2025
@yinghu5
Copy link
Collaborator

yinghu5 commented Mar 24, 2025

[Remind] @ftian1 please help to review the RFC, thank you!

Copy link
Contributor

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO this RFC skips too many significant details:

  • If translation is to compiled language, suitable build tools need to available for all of them
    • What languages this is targeted at; just Java?
  • If resulting program relies on external components, those need to be identified & installed, before program can be successfully build or executed
    • Some dependencies can be GB sized
  • There's no mention of how the execution is sandboxed
    • E.g. allowing network connectivity for malicious code would be rather serious (it could trivially do DOS attacks etc), but also innocent programs could be intended for network access
  • Many programs do not run without input, but there's no mention of how input data provision is managed
    • Code translation may be also asked for individual functions, not whole programs

PS. Even if agent would do only code linting instead of execution, that would also need code dependencies to be installed/present (at least their API files).

@letonghan
Copy link
Contributor Author

IMHO this RFC skips too many significant details:

  • If translation is to compiled language, suitable build tools need to available for all of them

    • What languages this is targeted at; just Java?
  • If resulting program relies on external components, those need to be identified & installed, before program can be successfully build or executed

    • Some dependencies can be GB sized
  • There's no mention of how the execution is sandboxed

    • E.g. allowing network connectivity for malicious code would be rather serious (it could trivially do DOS attacks etc), but also innocent programs could be intended for network access
  • Many programs do not run without input, but there's no mention of how input data provision is managed

    • Code translation may be also asked for individual functions, not whole programs

PS. Even if agent would do only code linting instead of execution, that would also need code dependencies to be installed/present (at least their API files).

Hi @eero-t , thanks for your detailed comments! I will update this RFC later, here's some responses of your questions:

  • For build tool environment:
    • We do need to prepare different environment (considering using docker) for many languages, such as Python, Java, C#, GO and so on.
    • The dependencies will be installed in a seperate docker container for each task.
  • For sandbox security:
    • A security policy is needed for docker container to prevent from malicious attack, this part will be updated in RFC in detail.
    • The network needs to be limited only for requirements installing, for example, using a white list.
  • For program input:
    • We may ask user to provide a basic input and output case.

Please let me know if you have other suggestions.

@eero-t
Copy link
Contributor

eero-t commented Mar 25, 2025

Please let me know if you have other suggestions.

@letonghan RFC needs to answer also following questions:

  • What is done when code translation is asked for target language that agent does not support (but LLM does)?
  • Why code is not linted (which typically finds more problems than running, and is easier)?
  • What advantage building+running provides over code linting?

I'm also wondering whether each supported language would need its own additional RFC, as that that's rather complex, language specific topic (language versions, their upgrades, access to their module repositories, security etc).

@eero-t
Copy link
Contributor

eero-t commented Mar 25, 2025

Will same agents be used also for CodeGen?

@eero-t
Copy link
Contributor

eero-t commented Mar 25, 2025

Then there's also the performance aspect.

Users expect responses in seconds, but fetching code dependencies for building (or linting) the code could take minutes, maybe even tens of minutes.

Building the code also takes extra time, especially if agents need to do several rounds of builds to get translated code into fully buildable state.

Meaning that:

  • User would need some feedback of the extra, time-consuming steps being performed
  • When agent usage can induce large slowdowns, it would be nice if either:
    • UI would have an option to disable it (when perceived improvement is low enough), or
    • Application could cache query context (e.g. fetched dependencies)
      • This way it's only one-time (time/BW/CPU) cost per context, instead of user seeing large lags also for successive queries (and user feedback telling that app is doing dumbly same thing over-and-over again)
      • But it raises the need for context IDs / management, and question of how long such (large) contexts should persist?

@letonghan
Copy link
Contributor Author

  • What is done when code translation is asked for target language that agent does not support (but LLM does)?
  • Why code is not linted (which typically finds more problems than running, and is easier)?
  • What advantage building+running provides over code linting?

I'm also wondering whether each supported language would need its own additional RFC, as that that's rather complex, language specific topic (language versions, their upgrades, access to their module repositories, security etc).

I think using lint/bandit is also a great option for code checking.
This could be a two-step thing. For the firt step, agent will automatically check code with tools like lint and fix simple typos/faults. For the second step, the updated code will be executed to make sure it works.

@letonghan
Copy link
Contributor Author

  • User would need some feedback of the extra, time-consuming steps being performed

  • When agent usage can induce large slowdowns, it would be nice if either:

    • UI would have an option to disable it (when perceived improvement is low enough), or

    • Application could cache query context (e.g. fetched dependencies)

      • This way it's only one-time (time/BW/CPU) cost per context, instead of user seeing large lags also for successive queries (and user feedback telling that app is doing dumbly same thing over-and-over again)
      • But it raises the need for context IDs / management, and question of how long such (large) contexts should persist?

Yes, building code sandbox, install dependencies, and execute it would take a lot of time.
As the two-steps thought above, I prefer to make link/execution as optional, which could be enabled/disabled in the web UI.

@lkk12014402
Copy link
Collaborator

  • User would need some feedback of the extra, time-consuming steps being performed

  • When agent usage can induce large slowdowns, it would be nice if either:

    • UI would have an option to disable it (when perceived improvement is low enough), or

    • Application could cache query context (e.g. fetched dependencies)

      • This way it's only one-time (time/BW/CPU) cost per context, instead of user seeing large lags also for successive queries (and user feedback telling that app is doing dumbly same thing over-and-over again)
      • But it raises the need for context IDs / management, and question of how long such (large) contexts should persist?

Yes, building code sandbox, install dependencies, and execute it would take a lot of time. As the two-steps thought above, I prefer to make link/execution as optional, which could be enabled/disabled in the web UI.

there is code execution tool https://github.com/QwenLM/Qwen-Agent/blob/main/qwen_agent/tools/code_interpreter.py

@yinghu5 yinghu5 added the A0 need to scrub label Mar 26, 2025
@minmin-intel
Copy link

Do we have customer requests for such code translation capabilities in OPEA? Is there a compelling need to invest engineering efforts in this? Shall we think about coding agent as a whole instead of just code translation or code generation? @letonghan @ftian1 @lkk12014402

@eero-t
Copy link
Contributor

eero-t commented Mar 31, 2025

Shall we think about coding agent as a whole instead of just code translation or code generation?

Considering it for both makes more sense to me. At least I cannot quickly think of any difference between verifying / improving result for code translation, vs code generation.

@letonghan
Copy link
Contributor Author

letonghan commented Apr 7, 2025

Hi @eero-t @minmin-intel , the RFC is updated.
The lint check tool and the code execution tool could be reused in both CodeTrans and CodeGen example, but I think we don't need to combine the RFCs here, since CodeGen RFC was already merged.
We can make sure it satisfies the needs of code agent, then develop and refine it in release v1.4.
Let's make sure this RFC be merged in v1.3 before middle April, thanks!

Copy link
Contributor

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much better now, but I still have few comments.

@eero-t
Copy link
Contributor

eero-t commented Apr 7, 2025

The lint check tool and the code execution tool could be reused in both CodeTrans and CodeGen example, but I think we don't need to combine the RFCs here, since CodeGen RFC was already merged.

Ok.

(I do not see an overlap between #272 and this RFC, except both being RAG, but "before translation" phase is indeed specific just to code translation.)

Copy link
Contributor

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not fan of code execution, but RFC itself looks OK now. There are just a few inconsistencies that would be good to fix before merging.

(Doc updates have resulted with some things being repeated in multiple sections. It would help if details of each step are described only once, and removed from more generic sections.)

@joshuayao joshuayao linked an issue Apr 8, 2025 that may be closed by this pull request
2 tasks
@joshuayao joshuayao added this to OPEA Apr 9, 2025
@joshuayao joshuayao moved this to In review in OPEA Apr 9, 2025
@joshuayao joshuayao added the documentation Improvements or additions to documentation label Apr 9, 2025
@joshuayao joshuayao merged commit c5bb594 into opea-project:main Apr 10, 2025
4 checks passed
@github-project-automation github-project-automation bot moved this from In review to Done in OPEA Apr 10, 2025
louie-tsai pushed a commit to intel-ai-tce/docs that referenced this pull request Apr 16, 2025
* Add CodeTrans with Agents RFC

Signed-off-by: letonghan <[email protected]>

* update diagram

Signed-off-by: letonghan <[email protected]>

* refine pre-llm agent design

Signed-off-by: letonghan <[email protected]>

* refine rfc according to comments

Signed-off-by: letonghan <[email protected]>

* revert file name change

Signed-off-by: letonghan <[email protected]>

* fix typo

Signed-off-by: letonghan <[email protected]>

* refine descriptions of retry limits in use case

Co-authored-by: Eero Tamminen <[email protected]>

* refine rfc according to comments

Signed-off-by: letonghan <[email protected]>

* refine descriptions

Co-authored-by: Eero Tamminen <[email protected]>

* Update community/rfcs/25-03-14-GenAIExample-001-CodeTrans-with-Agents.md

Co-authored-by: Eero Tamminen <[email protected]>

* Update community/rfcs/25-03-14-GenAIExample-001-CodeTrans-with-Agents.md

Co-authored-by: Eero Tamminen <[email protected]>

---------

Signed-off-by: letonghan <[email protected]>
Co-authored-by: Eero Tamminen <[email protected]>
Signed-off-by: Tsai, Louie <[email protected]>
ashahba pushed a commit that referenced this pull request Apr 16, 2025
* Getting Started Guide: ITAC steps update (#343)

* ITAC steps update

Signed-off-by: alexsin368 <[email protected]>

* remove FaqGen reference since it is merged into ChatQnA

Signed-off-by: alexsin368 <[email protected]>

* remove 1st and 2nd person words, NGINX notes

Signed-off-by: alexsin368 <[email protected]>

* ITAC steps update

Signed-off-by: alexsin368 <[email protected]>

* remove FaqGen reference since it is merged into ChatQnA

Signed-off-by: alexsin368 <[email protected]>

* remove 1st and 2nd person words, NGINX notes

Signed-off-by: alexsin368 <[email protected]>

* update docker install script and path to docs repo

Signed-off-by: alexsin368 <[email protected]>

---------

Signed-off-by: alexsin368 <[email protected]>
Signed-off-by: Tsai, Louie <[email protected]>

* [RFC] Code Agent in CodeTrans (#331)

* Add CodeTrans with Agents RFC

Signed-off-by: letonghan <[email protected]>

* update diagram

Signed-off-by: letonghan <[email protected]>

* refine pre-llm agent design

Signed-off-by: letonghan <[email protected]>

* refine rfc according to comments

Signed-off-by: letonghan <[email protected]>

* revert file name change

Signed-off-by: letonghan <[email protected]>

* fix typo

Signed-off-by: letonghan <[email protected]>

* refine descriptions of retry limits in use case

Co-authored-by: Eero Tamminen <[email protected]>

* refine rfc according to comments

Signed-off-by: letonghan <[email protected]>

* refine descriptions

Co-authored-by: Eero Tamminen <[email protected]>

* Update community/rfcs/25-03-14-GenAIExample-001-CodeTrans-with-Agents.md

Co-authored-by: Eero Tamminen <[email protected]>

* Update community/rfcs/25-03-14-GenAIExample-001-CodeTrans-with-Agents.md

Co-authored-by: Eero Tamminen <[email protected]>

---------

Signed-off-by: letonghan <[email protected]>
Co-authored-by: Eero Tamminen <[email protected]>
Signed-off-by: Tsai, Louie <[email protected]>

* [RFC] unified benchmark script for all examples under GenAIExamples (#276)

* add GenAIExamples benchmark design doc

* Update GenAIExamples Benchmark RFC

* Fix typo in benchmark RFC and revise deploy section

* Fix typos in the benchmark RFC

---------

Co-authored-by: Ying Hu <[email protected]>
Signed-off-by: Tsai, Louie <[email protected]>

* RFC: Haystack OPEA Integration (#222)

* Haystack integration rfc

Signed-off-by: Gad Markovits <[email protected]>

* Removed extraneous item from components list

Signed-off-by: Gad Markovits <[email protected]>

---------

Signed-off-by: Gad Markovits <[email protected]>
Signed-off-by: Tsai, Louie <[email protected]>

* add OpenTelemetry_OPEA_Guide.rst and ChatQnA.md for telemetry support

Signed-off-by: Tsai, Louie <[email protected]>

* Adding AgentQnA.md for Telemetry on AgentQnA

Signed-off-by: Tsai, Louie <[email protected]>

* Update tutorial/OpenTelemetry/deploy/AgentQnA.md

Co-authored-by: Copilot <[email protected]>
Signed-off-by: Tsai, Louie <[email protected]>

* Update index.rst

Signed-off-by: Tsai, Louie <[email protected]>

* Update tutorial/OpenTelemetry/OpenTelemetry_OPEA_Guide.rst and ChatQnA.md

Co-authored-by: Malini Bhandaru <[email protected]>
Signed-off-by: Tsai, Louie <[email protected]>

* removing redundant empty lines

Signed-off-by: Tsai, Louie <[email protected]>

* addressed comments

Signed-off-by: Tsai, Louie <[email protected]>

* Update tutorial/OpenTelemetry/OpenTelemetry_OPEA_Guide.rst

Co-authored-by: Malini Bhandaru <[email protected]>
Signed-off-by: Tsai, Louie <[email protected]>

---------

Signed-off-by: alexsin368 <[email protected]>
Signed-off-by: Tsai, Louie <[email protected]>
Signed-off-by: letonghan <[email protected]>
Signed-off-by: Gad Markovits <[email protected]>
Co-authored-by: alexsin368 <[email protected]>
Co-authored-by: Letong Han <[email protected]>
Co-authored-by: Eero Tamminen <[email protected]>
Co-authored-by: Tian, Feng <[email protected]>
Co-authored-by: Ying Hu <[email protected]>
Co-authored-by: gadmarkovits <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Malini Bhandaru <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A0 need to scrub documentation Improvements or additions to documentation
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[Feature] Code agent (CodeGen/CodeTrans) - Phase 1: RFC
7 participants