Problem Statement by Overlayy Gen AI PS
This project provides a comprehensive solution for scraping website content, generating concise questions based on the scraped data, identifying relevant links, and evaluating the quality of generated questions and link relevance. The script is designed to help automate content analysis and question generation using the OpenAI GPT-3.5-turbo model.
- Website Scraping: Extracts all hyperlinks and their associated textual content from a given website.
- Question Generation: Automatically generates concise questions based on the content scraped from each webpage using OpenAI's GPT-3.5-turbo model.
- Relevant Link Detection: Identifies and ranks the most relevant links based on the similarity of the content.
- Evaluation: Includes basic evaluation functions to assess the relevance, clarity, conciseness, and coverage of the generated questions. Additionally, it evaluates the precision, recall, and F1-score of the relevance detection.
- Python 3.x
requestslibraryBeautifulSoupfrom thebs4libraryopenailibraryscikit-learnlibrary
Install the required Python packages with:
pip install requests beautifulsoup4 openai scikit-learn-
Set up your OpenAI API key: After run the code you to enter your OpenAI API key at runtime.
-
Run the code: Execute the file and provide the URL of the website you wish to scrape. The code will automatically scrape the content, generate questions, and evaluate the results.
python script.py- Output: The generated questions, relevant links, and topics will be saved in an
output.jsonfile in the current directory. The script will also print the results and evaluations to the console.
Here's a basic example of how the code works:
- Input: A website URL (e.g.,
https://example.com). - Output: A JSON file containing:
- List of questions generated from each webpage.
- Relevant links detected for each webpage.
- Main topics extracted from each webpage's content.
The code includes simple evaluation metrics for the generated questions and relevance detection. You can modify the evaluation logic as per your specific use case.