Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Formatting #148

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open

WIP: Formatting #148

wants to merge 14 commits into from

Conversation

i-be-snek
Copy link
Collaborator

This PR is meant to do two things:

(1) format any left-over files using the pre-commit hook (done automatically)
(2) improve the README, especially after a large number of changes were made to the pipeline

README.md Outdated
Before you run our pipeline, please choose a version of prompts to proceed, which can be revised in the beginning of **run_prompts.py**

```shell
from Database.Prompts.prompts import V_3 as target_prompts
```

#### (Step 1) Raw output
Choose the raw file contains the text you need to process, please use the clear raw file name to indicate your experiment, this name will be used as the output file, the api env you want to use, the decription of the experiment, the prompt category, and the batch file location you want to store the batch file (this is not mandatory, but it's good to check if you create correct batch file)
Choose the raw file that contains the text you need to process. Please use clear raw file names to indicate your experiment. This name will be used as the output file, the api env you want to use, the decription of the experiment, the prompt category, and the batch file location you want to store the batch file (this is not mandatory, but it's good to check if you create correct batch file)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liniiiiii

I don't understand this sentence:

This name will be used as the output file, the api env you want to use, the decription of the experiment, the prompt category, and the batch file location you want to store the batch file (this is not mandatory, but it's good to check if you create correct batch file)

Is it suggesting that the experiment name and description and category will all be the name of the output file?

Maybe adding a psuedo example (or a real example) could help

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I will do that, where can I edit it, in the same branch?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can edit the same branch. I think for READMEs you can even safely edit directly in the Github website :D

@i-be-snek i-be-snek added the documentation Improvements or additions to documentation label Oct 2, 2024
README.md Outdated Show resolved Hide resolved
@liniiiiii
Copy link
Collaborator

Pls keep this pr for a while, I will check other readmes I edited later, thanks!

Before you run our pipeline, please choose a version of prompts to proceed, which can be revised in the beginning of **run_prompts.py**

```shell
from Database.Prompts.prompts import V_3 as target_prompts
```
##### Step 1: Experiment Settings
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this looks great! Thanks :D

One thing that could help the reader is to say that these are the params to pass into run_prompts

Suggested change
##### Step 1: Experiment Settings
##### Step 1: Experiment Settings
Here is what you need to begin an experiment run with `Database/Prompts/run_prompts.py`:

4. **Prompt Category**: Indicate the prompt category, such as "all".

5. **Batch File Location** (Optional): Specify where to store the batch file. This helps verify the batch file's creation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add something like this:

# check the args and flags
poetry run python3 Database/Prompts/run_prompts.py --help

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output:

wikimpacts-py3.11➜  Wikimpacts git:(drop-l1-missing-all-impacts) ✗ poetry run python3 Database/Prompts/run_prompts.py --help

usage: run_prompts.py [-h] [-f FILENAME] [-r RAW_DIR] [-b BATCH_DIR] [-m MODEL_NAME] [-t MAX_TOKENS] [-e API_ENV] [-d DESCRIPTION] [-p PROMPT_CATEGORY]

options:
  -h, --help            show this help message and exit
  -f FILENAME, --filename FILENAME
                        The name of the json file in the <Wikipedia articles> directory
  -r RAW_DIR, --raw_dir RAW_DIR
                        The directory containing Wikipedia json files to be run
  -b BATCH_DIR, --batch_dir BATCH_DIR
                        The directory where the batch file will land (as .jsonl)
  -m MODEL_NAME, --model_name MODEL_NAME
                        The model version applied in the experiment, like gpt-4o-mini.
  -t MAX_TOKENS, --max_tokens MAX_TOKENS
                        The max tokens of the model selected
  -e API_ENV, --api_env API_ENV
                        The env file that contains the API keys.
  -d DESCRIPTION, --description DESCRIPTION
                        The description of the experiment
  -p PROMPT_CATEGORY, --prompt_category PROMPT_CATEGORY
                        The prompt category of the experiment, can only choose from impact, basic, and all

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the suggestion, I will check them out after I fixed the visualization!

@@ -28,42 +30,137 @@ pre-commit installed at .git/hooks/pre-commit
git lfs install
```

## Quickstart
## Development
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the suggestion from @koffiworou, I've moved the dev doc section further to the top so that users can make sure they have all the basics and dependencies set up before developing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants