This project is a web scraper designed to scrape and download videos from Vidoza. It utilizes Selenium for scraping, Joblib for parallelization, and Requests for downloading the video files. The scraper is capable of handling multiple videos in parallel and efficiently downloading them to your system.
- Parallel Downloading: The scraper can download multiple videos simultaneously using Joblib, optimizing performance.
- Video Storage: Downloaded videos are saved in the
videos/
folder, with file names extracted from the webpage. - Error Handling: Robust error handling ensures the script retries downloading if it encounters errors.
├── scraper.py
Main script for scraping and downloading Vidoza videos (single-threaded).
├── scraper_parallel.py
Parallelized script for scraping and downloading multiple StreamTape videos concurrently.
- Clone the repository:
git clone https://github.com/satyampandey444/Vidoza_Video_Downloader
cd Vidoza_Video_Downloader
- Create a virtual environment (using pipenv for dependency management):
pipenv shell
- Install the dependencies:
pip install -r requirements.txt
-
Prepare the environment: Make sure you have Google Chrome installed (version 115 or higher). You can specify the path to Chrome WebDriver if needed. Adjust the environment variables in the .env file if necessary.
-
Provide the URLs to download: Ensure your links.txt file contains the StreamTape video URLs (one URL per line). The scraper will read this file to fetch and download the videos.
-
Run the Video Downloader:
If you want to download videos one at a time, run the following command:
python3 scrapperr.py
If you have a more powerful machine and want to download multiple videos concurrently, use the parallelized script:
python3 parallel_scrapper.py