Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Code Agent in CodeTrans #331

Merged
merged 18 commits into from
Apr 10, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
362 changes: 362 additions & 0 deletions community/rfcs/25-03-14-GenAIExample-001-CodeTrans-with-Agents.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,362 @@
# 25-03-14-GenAIExamples-001-CodeTrans-with-Agents

## Author(s)

[Han, Letong](https://github.com/letonghan)

## Objective

This RFC proposes the integration of two Agent mechanisms into the CodeTrans Example to enhance the reliability, user experience, and code quality. The goal is to minimize the propagation of erroneous code and improve the feasibility of automated code translation.

- Pre-LLM Agent: Validates the correctness of the input code before it is processed by the LLM. If errors are detected, the agent attempts to automatically fix them to ensure the code is executable. If the correction is successful, the modified code proceeds to the LLM.
- Post-LLM Agent: Does lint check and executes the translated code after it has been generated by the LLM. If the execution fails, the agent captures the error and sends it back to the LLM for re-generation.

Moreover, this design introduces a user-configurable **three-step validation pipeline**, allowing users to enable or disable each stage independently via the frontend UI.

* Step 1: **Auto-fix** – Automatically fixe code with syntax errors in agent.
* Step 2: **Lint Check** – Run language-specific lint check to catch style or semantic issues.
* Step 3: **Execution** – Securely run code in a sandbox environment to validate. (Plan to support only `Python` for now)

The Auto-fix step happens in agent service, and the lint check and execution steps are executed by external tools.

By introducing these agents, the system ensures that only valid code is passed to the LLM and that generated code is verified before reaching the user, thereby improving the overall efficiency and accuracy of the translation process.

## Motivation

The current CodeTrans flow has three major issues:

1. **User input may contain syntax or logic errors.** Passing faulty code directly to the LLM can result in incorrect or unusable translations.
2. **LLM-generated code isn’t always correct.** Without an automated validation step, users have to manually review and debug the output.
3. **No feedback loop exists.** The LLM doesn't adapt based on execution results, leading to repeated errors.

By introducing Agent mechanisms, we can improve the process in three key ways:

1. **Reduce error propagation**: Ensure that only valid code reaches the LLM, minimizing incorrect translations.
2. **Enhance user experience**: Detect input issues early, providing clear feedback to avoid unnecessary debugging.
3. **Improve code quality**: Automatically verify LLM-generated code and trigger re-generation when needed, increasing overall reliability.

## Use-Cases

### Detecting Errors in Input Code Before Translation

Scenario:

A developer wants to convert a Java script to Python but unknowingly provides code with syntax errors. If the faulty code is passed directly to the LLM, it might generate an incorrect or non-functional Python version.

How the CodeTrans Helps:

- User selects `Lint Check` in the web UI.
- Pre-LLM Agent does the lint check for the provided Java code.
- If the code has style or semantic issues, the agent will attempt to automatically fix them.
- The developer can review and confirm the fixes or manually adjust the code before resubmitting.

### Validating Generated Code for Accuracy

Scenario:

A developer uses the CodeTrans example to translate Java code into Python. The LLM generates a Python version, but there's no guarantee that it runs correctly. Without validation, the developer would have to manually check for errors, which is time-consuming.

How the CodeTrans Helps:

- User selects both `Lint Check` and `Code Execution` in the web UI.
- Post-LLM Agent does the lint check for the translation Python code.
- Agent will automatically fix any style/semantic issues.
- Post-LLM Agent executes the translated Python code:
- ✅ If the code runs successfully, the system returns the output to the user.
- ❌ If the code fails, the agent captures the error details and sends them back to the LLM.
- The LLM then retries code generation, using the error context to produce a corrected version.

This automated validation ensures that developers receive functional translations without having to manually test and debug every output.

### Preventing Infinite Regeneration Loops

Scenario:

In some cases, the LLM may repeatedly generate faulty code, leading to an endless loop of failed executions and retries. Without a safeguard, this could waste computation resources and frustrate users.

How the CodeTrans Helps:

- Both Pre- and Post-LLM Agents tracks retry attempts.
- If the LLM fails to produce a correct version after configurable number of attempts, the system stops further retries.
- Instead of another faulty translation, the user receives:
- ❌ "Code generation failed after multiple attempts. Here are possible reasons and debugging suggestions."
- The system provides relevant error logs and hints, helping the developer troubleshoot the issue efficiently.

This prevents the LLM from getting stuck in an infinite loop and improves user control over the process.

These use cases demonstrate how integrating Agents into the CodeTrans example improves input validation, output verification, and error handling. By ensuring only valid code reaches the LLM and automatically validating generated code, the system reduces errors, minimizes manual debugging, and improves translation accuracy. Retry limits and debugging feedback prevent infinite loops, making the process more reliable, efficient, and user-friendly.

## Design Proposal

### Architecture Diagram

```mermaid
graph LR
%% Colors %%
classDef blue fill:#ADD8E6,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
classDef orange fill:#FBAA60,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
classDef orchid fill:#C26DBC,stroke:#ADD8E6,stroke-width:2px,fill-opacity:0.5
subgraph User Interface
%% direction TD
Frontend[(Frontend Server)]:::orange
UIQ[User Input Query]:::orchid
UIQ --> Frontend
end

Megaservice[(CodeTrans Megaservice)]:::orange

subgraph CodeTrans Server
Agent1([Pre-LLM Agent]):::blue
LLM([LLM MicroService]):::blue
Agent2([Post-LLM Agent]):::blue
Agent1 -->|Refactor input Code| Agent1
Agent1 -->|Verified Code| LLM
LLM-->|Generated Code| Agent2
Agent2 -->|Re-Generate Request| LLM
end

LintTool([Lint Tool]):::blue
CodeExecutionTool([Sandbox Execution Tool]):::blue
Output[Translated Code]:::orchid

Frontend -->|Send Request| Megaservice
Megaservice -->|Send Code| Agent1
Agent1 --> |Lint check| LintTool
Agent1 -->|Validated input code| CodeExecutionTool
Agent2 --> |Lint check| LintTool
Agent2 -->|Validate genereated code| CodeExecutionTool
Agent2 -->|Output validated code| Output
```

### Components and Functionality

#### User Interface

UI Server:

- Handles user input (code, source language, target language)
- Sends requests to the CodeTrans megaservice

UI Components:

- Lint Check Button: Select to do lint check for input/output codes.
- Code Execution Button: Select to execute code for functionality check. (Support Python only)
- Input/output case: if `Code Execution Button` is selected, use will need to provide a set of input/output for this piece of code.
- Code Box – Displays the user-provided and the LLM-generated code.
- Code Translation Result – Shows the translated code, the lint check and execution result if have.

#### Backend Servers

CodeTrans Megaservice:

* Manages the scheduling of Agents, LLM, and user input/output.

Pre-LLM Agent:

- Validates code correctness, structures input/output, executes the code, and evaluates the result.
- Do lint check statically.
- If semantic errors are detected, LLM will fix it according to lint check report.
- Runs the user-provided code to check for syntax or logical errors.
- If errors are detected, the agent attempts to automatically fix them (within configurable number of attempts).
- If successfully corrected, the modified code proceeds to the LLM.
- If the errors cannot be resolved, the agent returns an error message, prompting the user to review and manually fix the code before proceeding.

LLM Microservice:

- Uses a large language model (LLM) to translate the input code into the target language.

Post-LLM Agent:

- Check code statically, executes the LLM-generated code, and verifies its correctness.
- If execution is successful, the translated code is returned to the user.
- If execution fails, the error details are sent back to the LLM for regeneration (within configurable number of attempts).

Lint Check Tool:

* Proceed lint check for a snippet of code, support different coding language like `pylint`, `eslint`, `cpplint` and so on.
* Since Lint is static check, it does not require a separate execution environment, so it can be called and executed directly in a python script.

Code Execution Tool:

- Provides a secure execution environment (e.g., Docker/Sandbox) to safely run code and prevent malicious execution risks.
- For reasons of complexity of implementation, only `Python` execution tool will be supported for now.

#### Lint Check Tool

Here's a table of lint tools for different coding languages:

| Coding Language | Lint Tool | Introduction | Reference |
| --------------- | ---------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------- |
| Python | Pylint | A tool that checks for errors in Python code, tries to enforce a coding standard and looks for bad code smells. | [link](https://www.pylint.org/) |
| JavaScript | ESLint | A tool for identifying and reporting on patterns found in ECMAScript/JavaScript code, with the goal of making code more consistent and avoiding bugs. | [link](https://eslint.org/docs/latest/use/getting-started) |
| Java | Checkstyle | A development tool to help programmers write Java code that adheres to a coding standard. | [link](https://checkstyle.sourceforge.io/index.html) |
| C++ | cpplint | A command-line tool to check C/C++ files for style issues according to[Google's C++ style guide](http://google.github.io/styleguide/cppguide.html) | [link](https://github.com/cpplint/cpplint) |
| Go | vet | Examines Go source code and reports suspicious constructs, such as Printf calls whose arguments do not align with the format string. | [link](https://pkg.go.dev/cmd/vet) |
| Bash/Shell | ShellCheck | Point out and clarify typical beginner's syntax issues that cause a shell to give cryptic error messages. | [link](https://github.com/koalaman/shellcheck/#readme) |

To use these tools to do static checks for different languages, we could save the target code into a temporary file, and execute the lint check command in `bash`.

This is an example script which support all of these languages.

```bash
#!/bin/bash
# usage: ./lint_tool.sh <language> <source_file>

LANGUAGE="$1"
SOURCE_FILE="$2"
REPORT_FILE="lint_report_${LANGUAGE}.txt"

# prepare file paths for Java checkstyle
CHECKSTYLE_JAR="./checkstyle.jar"
CHECKSTYLE_CONFIG="./google_checks.xml"

if [[ ! -f "$SOURCE_FILE" ]]; then
echo "Source file not found: $SOURCE_FILE"
exit 1
fi

case "$LANGUAGE" in
python)
echo "Running pylint..."
pylint "$SOURCE_FILE" > "$REPORT_FILE" 2>&1
;;

javascript)
echo "Running eslint..."
eslint "$SOURCE_FILE" > "$REPORT_FILE" 2>&1
;;

java)
echo "Running checkstyle..."
if [[ ! -f "$CHECKSTYLE_JAR" ]]; then
echo "Missing checkstyle.jar. Please download it first."
exit 1
fi
java -jar "$CHECKSTYLE_JAR" -c "$CHECKSTYLE_CONFIG" "$SOURCE_FILE" > "$REPORT_FILE" 2>&1
;;

go)
echo "Running go vet..."
go vet "$SOURCE_FILE" > "$REPORT_FILE" 2>&1
;;

cpp)
echo "Running cpplint..."
cpplint "$SOURCE_FILE" > "$REPORT_FILE" 2>&1
;;

*)
echo "Unsupported language: $LANGUAGE"
echo "Supported language: python, javascript, java, go, cpp"
exit 1
;;
esac

echo "Lint check completed. Report saved to $REPORT_FILE"

```

#### Code Execution Tool

Currently we only design to support code execution tool for `Python`.

* Prevent code injection

* Use Python Abstract Syntax Tree (AST) to detect and block dangerous operations such as `import os`, `exec`, and `__import__`.

```python
import ast

code = "xxx"
tree = ast.parse(code)
# analyze each node in ast tree
for node in ast.walk(tree):
# Do import checks
```
* Install dependencies automatically

* Code execution tool need to support extracting and installing dependencies from source code automatically.
* Use AST here to extract the `import`/`from xxx import xxx` libraries.

```python
import ast

def extract_imports(code_str):
tree = ast.parse(code_str)
imports = set()
for node in ast.walk(tree):
if isinstance(node, ast.Import):
for alias in node.names:
imports.add(alias.name.split('.')[0])
elif isinstance(node, ast.ImportFrom) and node.module:
imports.add(node.module.split('.')[0])
return list(imports)
```
* Sandbox execution

* To ensure that the code runs in a fully isolated environment, the tool needs to use container-based sandbox like `Docker`.
* In the context of resource constraints, the resource limits (including memory, CPU, process numbers) and security policies are needed.
* Since the dependencies need to be installed first, the network authority will be processed in two stages:
1. preperation: install the dependencies into a mounted path
```bash
# remove all the capabilities of the container except for needed ones
docker run --rm --cap-drop ALL --cap-add ${what_is_needed} -v $(pwd)/code:/code \
sandbox-python:3.10 \
bash -c "pip install -r /code/requirements.txt -t /code/.deps"
```
2. execution: run codes using mounted dependencies
```bash
docker run --rm --cap-drop ALL --cap-add ${what_is_needed} -v $(pwd):/code --network=none \
sandbox-python:3.10 \
python3 -I -E -S /code/user_code.py
```
* After each time of execution, the mounted folder (installed dependencies) will be cleaned up.

## Expected Benefits

| Feature | Benefits |
| ---------------------------- | ----------------------------------------------------- |
| Input Code Validation | Catches errors early, preventing faulty translations. |
| Output Code Validation | Ensures reliable and accurate code conversion. |
| Automated Debug Feedback | Reduces trial-and-error, improving LLM accuracy. |
| Lint Static Code Check | Catch bugs early and enforce consistent code quality. |
| Secure Execution Environment | Protects the system from malicious code. |
| Error Classification | Identifies syntax, logic errors for better debugging. |

## Risks and Mitigations / User Workarounds

* Node / cluster take over by execution of malicious code
* Mitigation: automated vetting of the executed code + its strict sandboxing
* Code execution exhausting node resources
* Mitigation: strict resource usage limits
* Application response taking too long due to dependency install / code execution
* Mitigation: dependency caching + enforced execution timeouts + error response to user
* Workaround: user disables linting / code execution
* Users can affect each others' results
* Mitigation: (dependency) caching is per-user session
* Code execution failing translation due to limits / sandboxing / dependency being offline
* Workaround: user disables code execution / linting

## Implementation Plan

### Phase 1: Develop Code Execution Tool, target v1.3

- Research on the Code Execution Tool.

### Phase 2: Core Feature Development, target v1.4

- Develop the Lint Check Tool bash script.
- Develop the Code Execution Tool in Agent to provide a secure execution environment.
- Implement the Pre-LLM Agent for input code validation.
- Improve UI integration by providing a code execution interface and displaying execution results.

### Phase 3: Agent Integration, target v1.4

- Integrate the LLM MicroService with Agent.
- Optimize the CodeTrans megaservice to automate the scheduling of Agents.
- Implement the Post-LLM Agent for output validation and LLM feedback handling.

### Phase 4: Optimization & Expansion, target v1.4

- Set a maximum retry limit to prevent infinite LLM regeneration loops.
- Provide debugging suggestions to enhance user experience.