Predicting Formula 1 race results from historical data, sessions, and simulation.
Core stack: Python for data wrangling & modeling, scikit-learn for ML, Pandas for feature engineering, and FastF1 for F1 timing/telemetry data.
A machine learning project that predicts Formula 1 race results using historical data and free practice session information. This project uses Gradient Boosting (R-squared: 0.494) and Random Forest (R-squared: 0.553) models to forecast driver finishing positions for upcoming Grand Prix events.
This project leverages the FastF1 library to access Formula 1 telemetry and timing data, building predictive models that can forecast race outcomes based on:
- Historical race performance data
- Free practice session lap times
- Qualifying results (when available)
- Driver and team form throughout the season
- Grid positions and starting positions
The project is organized into a dedicated Dutch GP folder containing all race-specific analysis and ML models:
f1-simulation-ml-project/
βββ Dutch GP/ # All Dutch GP related files
β βββ index.html # Main results comparison interface
β βββ dutch_gp_results.js # JavaScript logic for results
β βββ load_dutch_gp_results.py # Python data loading scripts
β βββ dutch_gp_results.json # Race results data
β βββ F1_Dutch_GP_prediction_ML_GBR_Random_Forest.ipynb # ML models
β βββ DUTCH_GP_RESULTS_README.md # Results system guide
β βββ DUTCH_GP_ORGANIZATION.md # Complete organization guide
β βββ NEXT_RACE_PREPARATION_CHECKLIST.md # Next race checklist
βββ index.html # Main project navigation hub
βββ README.md # This project documentation
βββ F1 Car Image Aug 27 2025 (1).png # Project logo
- Historical Race Analysis: Processes past race results to build driver performance profiles
- Free Practice Integration: Incorporates FP1, FP2, and FP3 session data for enhanced predictions
- Qualifying Data: Uses grid positions when available for more accurate predictions
- Driver Name Cleaning: Handles variations in driver names across different data sources
-
Gradient Boosting Regressor: Primary model with optimized hyperparameters
-
Random Forest Regressor: Alternative model for comparison (Gave 0.55
-
Cross-Validation: Uses GroupKFold to prevent data leakage between races
-
Feature Engineering: Creates comprehensive feature sets including:
- Season-to-date performance metrics
- Form scores and qualifying performance
- Practice session statistics
- Historical finish positions and reliability
- Point Predictions: Direct finish position predictions
- Monte Carlo Simulations: 3000+ simulations for probability analysis
- Win/Podium Probabilities: Calculates chances for different finishing positions
- Expected Points: Projects championship points based on predicted finishes
Before running this project, ensure you have the following installed:
# Core dependencies
pip install fastf1
pip install scikit-learn
pip install pandas
pip install joblib
pip install tqdm
pip install matplotlib
pip install numpy
- Open the main project hub:
index.html
- This provides navigation to all project components - Access Dutch GP Analysis: Navigate to the
Dutch GP/
folder for complete race analysis - Run ML Models: Use the Jupyter notebook in the Dutch GP folder for predictions
- View Results: Check the results comparison interface for Dutch GP analysis
-
Clone or download the project files
-
Install dependencies:
pip install -r requirements.txt
Or install manually:
pip install fastf1 scikit-learn pandas joblib tqdm matplotlib numpy
-
Open the Jupyter notebook:
jupyter notebook F1_Dutch_GP_prediction_ML_GBR_Random_Forest.ipynb
# Run prediction for a specific Grand Prix
race_order, summary = run_prediction_with_fallback(
year=2025,
grand_prix="Hungarian Grand Prix",
mode="race-only",
n_sims=5000,
race_only=False,
model_type='GradientBoosting'
)
# Compare Gradient Boosting vs Random Forest
race_order_gb, summary_gb = run_prediction_with_fallback(
2025, "Hungarian Grand Prix", mode="race-only",
n_sims=5000, race_only=False, model_type='GradientBoosting'
)
race_order_rf, summary_rf = run_prediction_with_fallback(
2025, "Hungarian Grand Prix", mode="race-only",
n_sims=5000, race_only=False, model_type='RandomForest'
)
race-only
: Uses only historical race data (no practice sessions)with-grid
: Includes qualifying results and grid positionsno-grid
: Excludes grid position information
load_past_race_results()
: Retrieves historical race dataload_qualifying_results()
: Gets qualifying session resultsload_practice_features()
: Extracts free practice session databuild_season_table()
: Creates season-wide performance metrics
build_training_data()
: Prepares labeled training datasetassemble_prediction_frame()
: Creates feature matrix for predictionsfit_model()
: Trains and evaluates ML modelsconvert_sim_preds_to_standings()
: Converts predictions to race standings
clean_driver_name()
: Standardizes driver namessafe_div()
: Handles division by zero safelysession_key()
: Creates unique session identifiers
The models are evaluated using:
- Mean Absolute Error (MAE): Average prediction error in positions
- Root Mean Square Error (RMSE): Penalizes larger prediction errors
- R-squared Score: Measures model fit quality
- Cross-Validation: Prevents overfitting using GroupKFold
The project includes data for the following 2025 season races:
- Australian Grand Prix
- Chinese Grand Prix
- Japanese Grand Prix
- Bahrain Grand Prix
- Saudi Arabian Grand Prix
- Miami Grand Prix
- Emilia Romagna Grand Prix
- Monaco Grand Prix
- Spanish Grand Prix
- Canadian Grand Prix
- Austrian Grand Prix
- British Grand Prix
- Belgian Grand Prix
- Hungarian Grand Prix
- FastF1 Library: Primary data source for F1 telemetry and timing
- Official F1 Data: Race results, qualifying times, and session data
- Practice Sessions: FP1, FP2, FP3 lap times and performance metrics
- Data Availability: The project includes fallback logic to use previous year's data if current year data is unavailable
- Caching: FastF1 caches data locally for improved performance
- Driver Changes: The model handles driver name variations and team changes
- Missing Data: Robust handling of missing practice sessions or qualifying data
Feel free to contribute to this project by:
- Adding new features or models
- Improving data processing functions
- Enhancing prediction accuracy
- Adding support for additional Grand Prix
This project is for educational and research purposes. Please respect F1 data usage terms and conditions.
- FastF1 library developers for providing access to F1 data
- Formula 1 for the official timing and telemetry data
- The open-source community for the machine learning libraries used
Note: This project is designed for educational purposes and race prediction analysis. Actual race results may vary due to numerous factors not captured in the model.