Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs:Utilizing Llama4 long context window to do data generation by role playing without RAG #2155

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

zjrwtx
Copy link
Collaborator

@zjrwtx zjrwtx commented Apr 9, 2025

Utilizing Llama4 long context window to do data generation by role playing without RAG)

Description

Describe your changes in detail (optional if the linked issue already contains a detailed description of the changes).

Checklist

Go over all the following points, and put an x in all the boxes that apply.

  • I have read the CONTRIBUTION guide (required)
  • I have linked this PR to an issue using the Development section on the right sidebar or by adding Fixes #issue-number in the PR description (required)
  • I have checked if any dependencies need to be added or updated in pyproject.toml and uv lock
  • I have updated the tests accordingly (required for a bug fix or a new feature)
  • I have updated the documentation if needed:
  • I have added examples if this is a new feature

If you are unsure about any of these, don't hesitate to ask. We are here to help!

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@zjrwtx zjrwtx changed the title Utilizing Llama4 long context window to do data generation by role playing docs:Utilizing Llama4 long context window to do data generation by role playing Apr 9, 2025
@zjrwtx zjrwtx self-assigned this Apr 9, 2025
@zjrwtx zjrwtx added documentation Improvements or additions to documentation use case labels Apr 9, 2025
@zjrwtx zjrwtx changed the title docs:Utilizing Llama4 long context window to do data generation by role playing docs:Utilizing Llama4 long context window to do data generation by role playing with RAG Apr 9, 2025
@zjrwtx zjrwtx marked this pull request as draft April 9, 2025 15:15
@zjrwtx zjrwtx changed the title docs:Utilizing Llama4 long context window to do data generation by role playing with RAG docs:Utilizing Llama4 long context window to do data generation by role playing without RAG Apr 10, 2025
@zjrwtx zjrwtx requested a review from Wendong-Fan April 10, 2025 07:38
@zjrwtx zjrwtx marked this pull request as ready for review April 10, 2025 07:38
@zjrwtx zjrwtx requested a review from fengju0213 April 10, 2025 07:40
Copy link
Collaborator

@fengju0213 fengju0213 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @zjrwtx left some comments

{
"cell_type": "code",
"source": [
"!ls"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this can be deleted

"\n",
"# add your topic input file\n",
"print(Fore.YELLOW + \"add your basic content file:\")\n",
"input_file =input(\"Enter the basic cotent file path (default basic_content.txt): \") or \"basic_content.txt\"\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the path shouldn't be entered this way, as it doesn't align with common user habits. Moreover, the default value is "basic_content.txt", but the example doesn't provide the corresponding file.

"\n",
"\n",
"\n",
"def load_input_file(file_path):\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe more file formats can be supported,the IO module in CMAEL should be able to accomplish this

"topics_file = input(\"Topics file name (default generated_topics.txt): \") or \"generated_topics.txt\"\n",
"output_dir = input(\"Output directory name (default generated_dialogues): \") or \"generated_dialogues\"\n",
"num_dialogues = int(input(\"Number of dialogues to generate per topic (default 1): \") or 1)\n",
"assistant_role = input(\"Assistant role name (default Python Programmer): \") or \"Python Programmer\"\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the assistant_role and user_role can be generated by the LLM when creating the topic, rather than being entered manually. And similar to the above, perhaps the way the file is input should be adjusted as well?

@zjrwtx
Copy link
Collaborator Author

zjrwtx commented Apr 12, 2025

thanks @fengju0213 ,i will change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation use case
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants