-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: 84 a solid version controlled copy of the prompts #86
WIP: 84 a solid version controlled copy of the prompts #86
Conversation
.gitignore
Outdated
@@ -9,6 +9,9 @@ results | |||
# ignore excel files | |||
**.xlsx | |||
|
|||
# ignore openai api keys | |||
**.env |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good thinking :)
info_box = str(item.get("Info_Box")) | ||
Whole_text = process_whole_text(item) | ||
|
||
prompt_building_damage_country_0715 = f"""Based on the provided article {info_box} {Whole_text}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tip: This prompt never changes! So if you create a "template" for it in python to format, there would be no reaosn to add it to the for-loop like this.
response_gpt4o = [] | ||
|
||
for item in data: | ||
Event_ID = str(item.get("Event_ID")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tip: These loops are almost identical, so you can avoid repetition by having a function that does everything inside this forloop (gets the event_id and source, sets up the prompt, etc...)
|
||
# skip the multi events | ||
|
||
from json.decoder import JSONDecodeError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you are reloading the json decoder here because you probably had this in many cells in jupyter. Now is the time to remove them.
json.dump(response_gpt4o, json_file, indent=4) | ||
from json.decoder import JSONDecodeError | ||
|
||
response_gpt4o = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's good for this to also be in a function, because if you forget to reset it to []
in any of these runs, it will carry over items from the previous one.
import json | ||
|
||
# Specify the file path | ||
file_path = input("File for prompting experiments:") |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
answer_dict = json.loads(answer_str) | ||
event_info.update(answer_dict) | ||
|
||
except JSONDecodeError as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job with catching errors :)
openai.api_key = api_key | ||
|
||
|
||
def completion_4(prompt): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curioustiy, why is it called completion_4
?
|
||
|
||
# Saving the results for all events to a JSON file | ||
with open( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tip: This storage script is also repetitive. One recommendation would be to turn it to a small function.
Database/Prompts/single_prompts_L1-3/prompts_V20240715_GPT4o_V0513.py
Outdated
Show resolved
Hide resolved
…r-content pair, also with the run_prompt.py file, in formatting the batch file process
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
As discussed, the |
What is the difference between the two? It's a bit hard for me to understand. @i-be-snek , the Prompting/gpt4_o_experiment_1 will be deleted in the end, it's for testing |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Database/Prompts/run_prompts.py
Outdated
# Step 2: Load the JSON data into a Python dictionary | ||
raw_text = json.load(file) | ||
|
||
# notice that due to the different version of prompts applied, the keys may a bit different, below is the version V_3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's good to only support V_3 here 🤔 since we mentioned that for V_1 (?) in the nlp4climate paper, all the prompts are in the appendix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will think about how to present it, because if we use the GPT4o-08-06 version, we need to define something else, and the prompt template will also change, maybe make it into separate functions, and according to the prompt version, choose to use the different processes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you show me an example of how these keys could differ, just to understand the problem better?
That small section can be a function on its own. It's good to try not to have a lot of repeated code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@i-be-snek , yes, so separate functions mean that we may split the prompts into two, for example,
"affected_L1/L2": """ xxxx"""
"affected_L3":"""xxx"""
then the key will change, and when we put them into the batch file, we need to append the key after the custom_id, which is better in the end to retrieve the results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay.
Hi, @i-be-snek and @MurathanKurfali , I think this is finished with clear code, and we can merge to the main, thanks! |
Database/Prompts/run_prompts.py
Outdated
@@ -7,7 +7,8 @@ | |||
import openai | |||
from dotenv import load_dotenv | |||
|
|||
from Database.Prompts.prompts import V_3 # change here to choose the version of prompts | |||
# the newest version of prompts are applied | |||
from Database.Prompts.prompts import V_3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can do name the prompt dictionary you are importing as:
from Database.Prompts.prompts import V_3 as target_prompts
and use target_prompts instead V_3 later in the code. so each time you change the target prompt dictionary (say you wanted to use V_2 instead), you can only change the import and the rest of the code will not need any changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done! thanks!
I need support for formatting the prompt codes to a package that can be implemented directly in .py file
/Ni