Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: 84 a solid version controlled copy of the prompts #86

Merged
merged 15 commits into from
Sep 5, 2024

Conversation

liniiiiii
Copy link
Collaborator

I need support for formatting the prompt codes to a package that can be implemented directly in .py file

/Ni

@liniiiiii liniiiiii linked an issue Aug 22, 2024 that may be closed by this pull request
2 tasks
.gitignore Outdated
@@ -9,6 +9,9 @@ results
# ignore excel files
**.xlsx

# ignore openai api keys
**.env
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thinking :)

info_box = str(item.get("Info_Box"))
Whole_text = process_whole_text(item)

prompt_building_damage_country_0715 = f"""Based on the provided article {info_box} {Whole_text},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip: This prompt never changes! So if you create a "template" for it in python to format, there would be no reaosn to add it to the for-loop like this.

response_gpt4o = []

for item in data:
Event_ID = str(item.get("Event_ID"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip: These loops are almost identical, so you can avoid repetition by having a function that does everything inside this forloop (gets the event_id and source, sets up the prompt, etc...)


# skip the multi events

from json.decoder import JSONDecodeError
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you are reloading the json decoder here because you probably had this in many cells in jupyter. Now is the time to remove them.

json.dump(response_gpt4o, json_file, indent=4)
from json.decoder import JSONDecodeError

response_gpt4o = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good for this to also be in a function, because if you forget to reset it to [] in any of these runs, it will carry over items from the previous one.

import json

# Specify the file path
file_path = input("File for prompting experiments:")

This comment was marked as resolved.

answer_dict = json.loads(answer_str)
event_info.update(answer_dict)

except JSONDecodeError as e:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job with catching errors :)

openai.api_key = api_key


def completion_4(prompt):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curioustiy, why is it called completion_4?



# Saving the results for all events to a JSON file
with open(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip: This storage script is also repetitive. One recommendation would be to turn it to a small function.

i-be-snek

This comment was marked as resolved.

@i-be-snek i-be-snek changed the title 84 a solid version controlled copy of the prompts WIP: 84 a solid version controlled copy of the prompts Aug 24, 2024
@liniiiiii

This comment was marked as resolved.

@i-be-snek

This comment was marked as resolved.

@liniiiiii

This comment was marked as resolved.

@liniiiiii
Copy link
Collaborator Author

A general note:

Now I noticed that we have Database/Prompts but we also have Prompting/gpt4_o_experiment_1 which both contain some prompts. At this point it might be good to think about those two directories. Ultimately and ideally, prompts should go only in one easy-to-find place or directory, so it might be worth thinking about if you want to move these to either directories or maybe edit the contents of Prompting/gpt4_o_experiment_1.

As discussed, the Prompting/gpt4_o_experiment_1 will only contains the experiments, and the Database/Prompts will contain the end version of prompts and codes

@i-be-snek
Copy link
Collaborator

i-be-snek commented Aug 28, 2024

@liniiiiii

As discussed, the Prompting/gpt4_o_experiment_1 will only contains the experiments, and the Database/Prompts will contain the end version of prompts and codes

What is the difference between the two? It's a bit hard for me to understand.

@i-be-snek , the Prompting/gpt4_o_experiment_1 will be deleted in the end, it's for testing

@liniiiiii

This comment was marked as resolved.

@i-be-snek

This comment was marked as resolved.

@liniiiiii

This comment was marked as resolved.

# Step 2: Load the JSON data into a Python dictionary
raw_text = json.load(file)

# notice that due to the different version of prompts applied, the keys may a bit different, below is the version V_3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to only support V_3 here 🤔 since we mentioned that for V_1 (?) in the nlp4climate paper, all the prompts are in the appendix.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will think about how to present it, because if we use the GPT4o-08-06 version, we need to define something else, and the prompt template will also change, maybe make it into separate functions, and according to the prompt version, choose to use the different processes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you show me an example of how these keys could differ, just to understand the problem better?
That small section can be a function on its own. It's good to try not to have a lot of repeated code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@i-be-snek , yes, so separate functions mean that we may split the prompts into two, for example,

"affected_L1/L2": """ xxxx"""
"affected_L3":"""xxx"""

then the key will change, and when we put them into the batch file, we need to append the key after the custom_id, which is better in the end to retrieve the results.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

@liniiiiii
Copy link
Collaborator Author

Hi, @i-be-snek and @MurathanKurfali , I think this is finished with clear code, and we can merge to the main, thanks!

@liniiiiii liniiiiii self-assigned this Sep 4, 2024
@@ -7,7 +7,8 @@
import openai
from dotenv import load_dotenv

from Database.Prompts.prompts import V_3 # change here to choose the version of prompts
# the newest version of prompts are applied
from Database.Prompts.prompts import V_3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can do name the prompt dictionary you are importing as:

from Database.Prompts.prompts import V_3 as target_prompts

and use target_prompts instead V_3 later in the code. so each time you change the target prompt dictionary (say you wanted to use V_2 instead), you can only change the import and the rest of the code will not need any changes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done! thanks!

@liniiiiii liniiiiii merged commit 31510ab into main Sep 5, 2024
1 check passed
@liniiiiii liniiiiii deleted the 84-a-solid-version-controlled-copy-of-the-prompts branch September 5, 2024 11:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

A solid, version-controlled copy of the prompts
3 participants