This Python script processes images in a specified folder, sends them to the OpenAI API and saves the responses as text files.
- Image Encoding: Encodes images into base64 format for API requests.
- API Interaction: Sends images and prompts to the OpenAI API to generate descriptions or answers related to the images.
- MIME Type Handling: Determines the correct MIME type for various image formats (.jpg, .jpeg, .png, .gif, .bmp).
- Folder Management: Automatically creates necessary folders (
images
andimages/answers
) if they don't exist. - Error Handling: Includes basic error handling for file operations and API requests.
-
Python 3.8 or higher
-
OpenAI Python library (
pip install openai
) -
python-dotenv
library (pip install python-dotenv
) -
An OpenAI API key
-
A
.env
file in the root directory of the script containing the following:OPENAI_API_KEY=<your_openai_api_key> ROLE_PROMPT=<your_system_role_prompt> CONTENT_PROMPT=<your_user_content_prompt>
OPENAI_API_KEY
: Your OpenAI API key.ROLE_PROMPT
: The role prompt (system prompt) to use for the OpenAI API.CONTENT_PROMPT
: The content prompt (user prompt) to use for the OpenAI API.OPENAI_MODEL
: The model to use for the OpenAI API requests (optional, defaults togpt-4o-mini
).
-
Clone the repository:
git clone <repository_url> cd <repository_name>
-
Install dependencies:
pip install -r requirements.txt
(Assuming you have a
requirements.txt
file withopenai
andpython-dotenv
) -
Create a
.env
file:- Create a file named
.env
in the root directory of your project. - Add your OpenAI API key, role prompt, and content prompt to the
.env
file as described in the "Prerequisites" section.
- Create a file named
-
Place images in the
images
folder:- Put the images you want to process into the
images
folder, which will be created automatically in the same directory as the script if it doesn't exist.
- Put the images you want to process into the
-
Run the script:
python main.py
-
Find the responses:
- The script will process each image in the
images
folder. - For each image (e.g.,
image1.jpg
), a corresponding text file (e.g.,image1_answer.txt
) will be created in theimages/answers
folder containing the response from the OpenAI API.
- The script will process each image in the
- Takes an image path as input.
- Opens the image in binary read mode (
"rb"
). - Reads the image content.
- Encodes the image data into a base64 string using
base64.b64encode()
. - Decodes the base64 string to UTF-8 for compatibility with JSON.
- Returns the base64 encoded image string.
- Takes an image path and its extension as input.
- Defines a dictionary
mime_types
to map image file extensions to their corresponding MIME types. - Calls
encode_image()
to get the base64 representation of the image. - Sends a request to the OpenAI API using
client.chat.completions.create()
.- Specifies the model as
"gpt-4o-mini"
. - Constructs the message with a system role and a user role.
- System role includes the
role_prompt
defined in the.env
file. - User role includes the
content_prompt
and the image data. - The image data is formatted as an
image_url
with the appropriate MIME type and the base64 encoded image.
- System role includes the
- Sets
max_tokens
to 300 to limit the response length.
- Specifies the model as
- Prints the raw API response.
- Extracts the content of the response (the description or answer) from
response.choices[0].message.content
. - Returns the extracted content.
- Takes a file path as input.
- Uses
mimetypes.guess_type()
to determine the MIME type of the file based on its extension. - Returns
True
if the MIME type starts with"image"
, indicating it's an image file; otherwise, returnsFalse
.
- Takes a folder path as input.
- Iterates through each file in the specified folder using
os.listdir()
. - For each file, checks if it's an image using
is_image()
. - If it's an image:
- Extracts the file name and extension using
os.path.splitext()
. - Constructs the full path to the image file.
- Constructs the path for the corresponding answer file in the
images/answers
folder. - Checks if an answer file already exists. If not:
- Calls
image_requests()
to get the response from the OpenAI API. - Writes the response to the answer file.
- Prints a message indicating that the image was processed and the answer was saved.
- Calls
- Extracts the file name and extension using
- Takes a folder path as input.
- Checks if the folder exists using
os.path.exists()
. - If the folder doesn't exist, it creates it using
os.makedirs()
.
- Ensures that the code inside this block is executed only when the script is run directly (not imported as a module).
- Gets the current working directory using
os.getcwd()
and sets it asapp_path
. - Constructs the path to the
images
folder. - Calls
check_folder()
to create theimages
folder and theimages/answers
subfolder if they don't exist. - Calls
process_images_files()
to process the images in theimages
folder. - Includes a
try...except
block to catch any exceptions during the process and print an error message.
- The script assumes you are using the
gpt-4o-mini
model. You can modify themodel
parameter inimage_requests()
if you want to use a different model. - The script currently has a hardcoded
max_tokens
value of 300. You might need to adjust this based on your needs and the complexity of the expected responses. - Make sure to replace the placeholder values in the
.env
file with your actual API key and prompts. - This is a basic implementation. You can extend it further by adding features like batch processing, more sophisticated error handling, logging, and user interface elements.
If you have any feedback about the project, please let me know. I am always looking for ways to improve the user experience.