A powerful React-based web scraping application that extracts structured data from websites with analytics capabilities.
- Web scraping with customizable options
- Analytics dashboard with performance metrics
- Schema.org data extraction and validation
- Interactive visualizations (word clouds, network graphs, database schemas)
- Export functionality (GraphML, JSON, CSV)
- Comprehensive test coverage
- Node.js (v16 or higher)
- npm or yarn
- Clone the repository:
git clone https://github.com/aledlie/SingleSiteScraper.git
cd SingleSiteScraper- Install dependencies:
npm install- Start the development server:
npm run dev- Open http://localhost:5173 in your browser
This project uses Vitest for testing with comprehensive coverage across all components and utilities.
All test files are organized in the tests/ directory, mirroring the source structure:
tests/
├── run_tests.sh # Test runner script
└── src/
├── test-setup.ts # Test configuration and mocks
├── components/ # Component tests
├── scraper/ # Scraper logic tests
├── utils/ # Utility function tests
├── analytics/ # Analytics engine tests
├── visualizations/ # Visualization component tests
└── integration/ # End-to-end integration tests
Quick Start:
bash tests/run_tests.shAvailable Test Commands:
# Run all tests
npm run test
# Run tests with coverage report
npm run test:coverage
# Run tests in watch mode
npm run test -- --watch
# Run specific test file
npm run test -- tests/src/scraper/scrapeWebsite.test.ts
# Run tests matching a pattern
npm run test -- --grep "analytics"Test Categories:
- Unit Tests: Individual component and function testing
- Integration Tests: End-to-end workflow validation
- Schema Tests: Schema.org compliance verification
- Performance Tests: Analytics and performance monitoring
- UI Tests: React component rendering and interaction
The test suite covers:
- Web scraping functionality and error handling
- Analytics dashboard components
- Data visualization components
- Schema.org data extraction and validation
- Performance monitoring and alerts
- Database integration (SQLMagic)
- Export functionality
- Error resilience and edge cases
./
├── src/ # Source code
│ ├── components/ # React components
│ ├── scraper/ # Web scraping logic
│ ├── analytics/ # Analytics engine
│ ├── visualizations/ # Data visualization components
│ ├── utils/ # Utility functions
│ └── types/ # TypeScript type definitions
├── tests/ # Test files (mirrors src structure)
├── vitest.config.ts # Test configuration
├── package.json # Dependencies and scripts
└── README.md # This file
npm run dev- Start development servernpm run build- Build for productionnpm run lint- Run ESLintnpm run preview- Preview production buildnpm run test- Run testsnpm run test:coverage- Run tests with coverage
The application follows a modular architecture:
- Scraping Engine (
src/scraper/) - Core web scraping functionality - Analytics System (
src/analytics/) - Performance monitoring and insights - Visualization Layer (
src/visualizations/) - Interactive data visualizations - Component Library (
src/components/) - Reusable UI components - Utilities (
src/utils/) - Helper functions and data processing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Run
bash tests/run_tests.shto ensure all tests pass - Submit a pull request
This project is licensed under the ISC License.