Skip to content

bleedblack1/IITK_hackathon-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Problem Statement by Overlayy Gen AI PS

Website Scraper and Question Generator

This project provides a comprehensive solution for scraping website content, generating concise questions based on the scraped data, identifying relevant links, and evaluating the quality of generated questions and link relevance. The script is designed to help automate content analysis and question generation using the OpenAI GPT-3.5-turbo model.

Features

  • Website Scraping: Extracts all hyperlinks and their associated textual content from a given website.
  • Question Generation: Automatically generates concise questions based on the content scraped from each webpage using OpenAI's GPT-3.5-turbo model.
  • Relevant Link Detection: Identifies and ranks the most relevant links based on the similarity of the content.
  • Evaluation: Includes basic evaluation functions to assess the relevance, clarity, conciseness, and coverage of the generated questions. Additionally, it evaluates the precision, recall, and F1-score of the relevance detection.

Requirements

  • Python 3.x
  • requests library
  • BeautifulSoup from the bs4 library
  • openai library
  • scikit-learn library

Install the required Python packages with:

pip install requests beautifulsoup4 openai scikit-learn

Usage

  1. Set up your OpenAI API key: After run the code you to enter your OpenAI API key at runtime.

  2. Run the code: Execute the file and provide the URL of the website you wish to scrape. The code will automatically scrape the content, generate questions, and evaluate the results.

python script.py
  1. Output: The generated questions, relevant links, and topics will be saved in an output.json file in the current directory. The script will also print the results and evaluations to the console.

Example

Here's a basic example of how the code works:

  • Input: A website URL (e.g., https://example.com).
  • Output: A JSON file containing:
    • List of questions generated from each webpage.
    • Relevant links detected for each webpage.
    • Main topics extracted from each webpage's content.

Evaluation

The code includes simple evaluation metrics for the generated questions and relevance detection. You can modify the evaluation logic as per your specific use case.

About

Problem Statement by Overlayy Gen AI PS

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published