This Flask-based application scrapes the latest news headlines and descriptions from The Atlantic and stores the data before rendering it on a webpage. Additionally, it integrates with the Pokémon API to fetch and manage Pokémon data.
- News Scraper Application
- Table of Contents
- Requirements
- Setup Instructions
- 1. Clone the Repository
- 2. Create a Virtual Environment
- 3. Create a
.gitignore
- 4. Install Dependencies
- 5. Set Up Google Applications and Keys
- Google Application Setup and
.env
Configuration - 6. Data Schema
- 7. Download Frontend Dependencies
- 8. Run the Flask Application
- 9. Open Your Web Browser
- 10. Testing Instructions
- 11. Deployment
- 12. Environment Configuration
- 13. Deploy on Render
- 14. Deploy on Vercel
- 15. Database on Render
- Advanced Search
- Stretch Goals
- Learning Experience
The following packages are required to run the application:
- Flask: Web framework for Python.
- requests: HTTP library for sending GET requests to fetch the news data.
- beautifulsoup4: Library for parsing HTML and scraping data.
- google-auth: Library for authenticating with Google services.
- google-auth-oauthlib: Library for OAuth 2.0 authentication with Google.
- Svelte: A modern JavaScript framework for building user interfaces.
Follow these steps to get the application up and running:
Clone this repository to your local machine:
git clone https://github.com/mai-repo/RG-Knowledge-Check-1.git
python -m venv .venv
Add .venv
in the .gitignore
file to prevent committing the virtual environment folder:
.venv/
Install all the required Python packages:
pip3 install -r requirements.txt
Follow the instructions to set up Google applications and obtain the necessary keys for authentication.
- Go to the Google Cloud Console.
- Create a new project and enable the following APIs:
- Google Identity Services API
- reCAPTCHA API
- Go to Credentials in the Google Cloud Console.
- Create OAuth 2.0 Client ID for a web app.
- Add authorized origins (e.g.,
http://127.0.0.1:5000/
). - Download the JSON file with client secrets.
- Register your site in the reCAPTCHA Admin Console.
- Choose reCAPTCHA type (v2).
- Obtain site key and secret key.
- Create a
.env
file in the root directory of your project. - Add the following variables:
GOOGLE_CLIENT_SECRET=your-google-client-secret
RECAPTCHA_SECRET_KEY=your-recaptcha-secret-key
DATABASE_URL=your-render-database
The news data is stored in an PostgreSQL database with the following schema:
CREATE TABLE IF NOT EXISTS news (
id SERIAL PRIMARY,
headline TEXT NOT NULL,
summary TEXT NOT NULL,
link TEXT NOT NULL
);
The Pokémon data is stored in an PostgreSQL database with the following schema:
CREATE TABLE IF NOT EXISTS pokemon (
id SERIAL PRIMARY,
username TEXT NOT NULL,
pokemonName TEXT NOT NULL,
image TEXT NOT NULL
);
The favorite articles data is stored in a PostgreSQL database with the following schema:
CREATE TABLE IF NOT EXISTS favArt (
id SERIAL PRIMARY KEY,
username TEXT NOT NULL,
news_id INT NOT NULL,
FOREIGN KEY (news_id) REFERENCES news(id) ON DELETE CASCADE
);
The table for full-text search is created using the following schema:
CREATE TABLE IF NOT EXISTS news_fts (
id INT PRIMARY KEY,
headline TEXT,
summary TEXT,
link TEXT
);
Navigate to the frontend directory:
cd Frontend
Install dependencies:
npm install
export FLASK_APP=Backend.main
flask run
Navigate to the frontend directory:
cd Frontend
Start the development server:
npm run dev
This will start the frontend application, and you can access it in your web browser at http://localhost:9000
.
Follow these steps to run the tests for the application:
Ensure that you have installed all the required dependencies as mentioned in the Setup Instructions.
- Using unittest
To run tests with
unittest
, use the following command:
python -m unittest discover -s Backend/tests -p "test_*.py"
If you want to generate a test coverage report, you can use pytest-cov
.
- Install pytest-cov
Install
pytest-cov
using the following command:
pip install pytest-cov
- Run Tests with Coverage Run the tests with coverage using the following command:
pytest --cov=Backend --cov-report=html Backend/tests
This will generate a coverage report in the htmlcov
directory. You can view the report by opening the index.html
file in a web browser:
open htmlcov/index.html
Create a Dockerfile
in the root directory of your project:
> **Note:** The following `Dockerfile` is a template. You may need to adjust it according to your specific project requirements.
```dockerfile
# Use the official Python image from the Docker Hub
FROM python:3.9-slim
# Set the working directory
WORKDIR /app
# Copy the requirements file into the container
COPY requirements.txt .
# Install the dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Copy the rest of the application code into the container
COPY . .
# Set environment variables
ENV FLASK_APP=Backend.main
# Expose the port the app runs on
EXPOSE 5000
# Run the application
CMD ["flask", "run", "--host=0.0.0.0"]
Create a render.yaml
file in the root directory of your project to define the Render service configuration this is a template that comes with the repo:
services:
- type: web
name: news-scraper
env: docker
dockerfilePath: ./Dockerfile
envVars:
- key: FLASK_ENV
value: production
- key: FLASK_APP
value: Backend/main.py
- key: GOOGLE_CLIENT_KEY
value: GOOGLE_CLIENT_KEY
- key: BACKEND_KEY
fromDatabase: BACKEND_KEY
- key: DATABASE_URL
fromDatabase: DATABASE_URL
startCommand: gunicorn -w 4 -b 0.0.0.0:8080 Backend.main:app
Create .env.local
for local development:
VITE_API_BASE_URL=https://your-loca-url.com
Create .env.production
for production:
VITE_API_BASE_URL=https://your-production-url.com
Follow these steps to deploy your application on Render:
- Create a new web service on Render.
- Connect your GitHub repository.
- Set the build and start commands:
- Build Command:
pip install -r Backend/requirements.txt
- Start Command:
gunicorn -w 4 -b 0.0.0.0:8080 Backend.main:app
- Add environment variables in the Render dashboard:
GOOGLE_CLIENT_SECRET=your-google-client-secret
RECAPTCHA_SECRET_KEY=your-recaptcha-secret-key
DATABASE_URL=your-database-key
- Deploy the application.
Follow these steps to deploy your frontend application on Vercel:
- Log in to your Vercel account.
- Connect your GitHub repository to Vercel.
- Set the environment variables in the Vercel dashboard:
VITE_API_BASE_URL=https://your-backend-url.com
- Deploy the application.
To set up the database on Render:
- Create a new PostgreSQL database on Render.
- Note the database URL provided by Render.
- Add the database URL to your environment variables in the Render dashboard:
DATABASE_URL=your-database-url
- Update your application to use the Render database URL for database connections.
The advanced search feature allows users to search for news articles based on specific keywords. This feature enhances the user experience by providing more relevant search results.
- Navigate to the Search Page: Go to the search page in the application.
- Enter Keywords: Enter the keywords you want to search for in the search bar.
- View Results: The application will display the news articles that match the entered keywords.
Example
- If you want to search for articles related to "economy" or "Trump", enter "economy" and "trump" in the search bar and press enter. The application will display all articles that contain the keyword "economy".
- Allow users to choose from a variety of news sites
Building this News Scraper Application was a challenging yet rewarding experience that taught me how to build a full-stack application from scratch, integrating both backend and frontend technologies. I learned how to develop a full-stack application using Flask for the backend and Svelte for the frontend, and how to implement web scraping using BeautifulSoup to fetch news data. Deploying the application on Render and Vercel required careful configuration and taught me how to manage deployment pipelines and ensure the application runs smoothly in a production environment.