Website Scraper and Question Generator

Problem Statement by Overlayy Gen AI PS

Website Scraper and Question Generator

This project provides a comprehensive solution for scraping website content, generating concise questions based on the scraped data, identifying relevant links, and evaluating the quality of generated questions and link relevance. The script is designed to help automate content analysis and question generation using the OpenAI GPT-3.5-turbo model.

Features

Website Scraping: Extracts all hyperlinks and their associated textual content from a given website.
Question Generation: Automatically generates concise questions based on the content scraped from each webpage using OpenAI's GPT-3.5-turbo model.
Relevant Link Detection: Identifies and ranks the most relevant links based on the similarity of the content.
Evaluation: Includes basic evaluation functions to assess the relevance, clarity, conciseness, and coverage of the generated questions. Additionally, it evaluates the precision, recall, and F1-score of the relevance detection.

Requirements

Python 3.x
requests library
BeautifulSoup from the bs4 library
openai library
scikit-learn library

Install the required Python packages with:

pip install requests beautifulsoup4 openai scikit-learn

Usage

Set up your OpenAI API key: After run the code you to enter your OpenAI API key at runtime.
Run the code: Execute the file and provide the URL of the website you wish to scrape. The code will automatically scrape the content, generate questions, and evaluate the results.

python script.py

Output: The generated questions, relevant links, and topics will be saved in an output.json file in the current directory. The script will also print the results and evaluations to the console.

Example

Here's a basic example of how the code works:

Input: A website URL (e.g., https://example.com).
Output: A JSON file containing:
- List of questions generated from each webpage.
- Relevant links detected for each webpage.
- Main topics extracted from each webpage's content.

Evaluation

The code includes simple evaluation metrics for the generated questions and relevance detection. You can modify the evaluation logic as per your specific use case.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
GEN_AI_PS (1).ipynb		GEN_AI_PS (1).ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Website Scraper and Question Generator

Features

Requirements

Usage

Example

Evaluation

About

Uh oh!

Releases

Packages

Languages

bleedblack1/IITK_hackathon-

Folders and files

Latest commit

History

Repository files navigation

Website Scraper and Question Generator

Features

Requirements

Usage

Example

Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages