This project aims to estimate unknown parts of an ordinary differential equation (ODE) system in the context of biological signalling networks. The project can be divided into two central parts with their own specific methodology:
- approximating the unknown part(s) of the ODE system using a neural network (UDE approximation).
- estimating the formula of the resulting unknown function(s) (equation discovery)
To test the approach, two biological models were used to simulate data and build a UDE system out of their respective ODE system:
- the Negative FeedBack (NFB) model presented in Chapter 13: Parameter Estimation, Sloppiness, and Model Identifiability by D. Daniels, M. Dobrzyński, D. Fey in "Quantitative Biology: Theory, Computational Methods and Examples of Models" (2018).
- the ERK signalling (ERK) model presented by Ryu et al. in "Frequency modulation of ERK activation dynamics rewires cell fate" (2015) DOI.
The goal of the method would typically be to identify some feedback mechanisms that are part of the dynamics using a measurable variable of the system. Here are the schematics of the two models where g2p, resp. ERK are the observed variables:
For the first part, a UDE model, i.e. an ODE model with unknown part(s) that are modelled by a neural network, is fitted to real or simulated data. Here is an example with the original ODE system and updated UDE system of the NFB model introduced above:
The parameters of the neural network are optimised while solving the ODE problem. The approach used is largely inspired from the tutorial "Automatically Discover Missing Physics by Embedding Machine Learning into Differential Equations" of the SciML ecosystem. Here is a schematic of the optimisation process:
To better illustrate the process, here is an animation of the UDE optimisation for the NFB model in the negative feedback case "ab" with an 0.05M input stimulation. The left panel shows the ODE solution at the given iteration; the middle panel shows the data fitting of the second kinase (observed variable); the right panel shows the neural network output.
nfb_animation.mp4
For the second part, the function(s) represented by the neural network are fed into a SINDy-based algorithm to estimate the mathematical formula [4],[5]. The algorithm uses sparse regression to select the optimal terms and their coefficients (
The purpose is to gain insights about the signalling mechanisms in a data-driven manner while incorporating some prior knowledge about the system.
The workflow of the project is structured into two consecutive parts:
- UDE approximation: the UDE approximation scripts will output the neural network estimation given a specified UDE model. In this project, UDE systems for the NFB model and the ERK model were used. The data simulation as well as UDE optimisation results are saved in CSV format.
- Equation discovery (E-SINDy): the E-SINDy scripts take as input the previously generated UDE results and output the optimal equation formula. The output contains also various statistics about the E-SINDy run and is saved in JDL2 format.
For each case interactive Pluto/Jupyter notebooks and Julia scripts are provided, for easier readability on one hand and automation on the other.
This repository contains three main directories:
-
Code:
Pluto/Jupyter notebooks and Julia scripts.- UDE approximation:
Pluto notebook:<model_name>_ude_approximation_pluto.jl
Jupyter notebook:<model_name>_ude_approximation.ipynb
Julia script:<model_name>_ude_approximation.jl - E-SINDy:
Pluto notebook:<model_name>_esindy_pluto.jl
Jupyter notebook:<model_name>_esindy.ipynb
Julia script:esindy.jlandesindy_utils.jl
Note: the Julia scriptesindy_utils.jlcontains all the utility functions that are used to run the equation discovery part, while the scriptesindy.jlallows to run the algorithm.
- UDE approximation:
-
Data:
Final and intermediate results (CSV and JDL2 files) that were obtained for the two considered ODE systems, i.e. NFB and ERK model. -
Plots:
All plots of the results.
To run the code in this repository, you need to have Julia installed which can be downloaded at https://julialang.org/downloads/. Please note that Julia Version 1.10.4 was used as referenced in the Manifest.toml file.
Clone this repository to your local machine.
git clone https://github.com/girochat/DataDrivenEquationDiscovery.git
For reproducibility, it is recommended to use the directory of the project as a Julia environment:
- Go to the Code directory of the repository:
cd /your_path_to_the_repo/DataDrivenEquationDiscovery/Code - Open Julia and open the REPL by pressing ']'
- In the REPL, activate the local environment and instantiate the packages:
pkg> activate .
pkg> instantiate
Pluto.jl is an interactive notebook environment for Julia similar to Jupyter for Python. It provides interactivity for running the UDE approximation and E-SINDy and visualising the results. To open the notebook, follow these steps:
Start the Pluto notebook in Julia:
using Pluto
Pluto.run()
In the Pluto interface, open the desired notebook file to start exploring the method and visualising the results.
If you prefer using Jupyter Notebook, the Jupyter version of the Pluto notebooks are also provided.
The UDE scripts are specific to the ODE model considered and take some file argument. Make sure to be in the Code/ folder before running the command or replace the dot in --project=. by the relative/absolute path to the Code/ directory.
To run the NFB model UDE approximation script:
julia --project=. NFB_ude_approximation.jl \
<NFB model type (no/a/b/ab)> \
<input concentration> \
<save parameters (y/n)> \
<load parameters (y/n)> \
To run the ERK model UDE approximation script:
julia --project=. ERK_ude_approximation.jl \
<Growth factor type (EGF/NGF)> \
<Type of input concentration (high/low)> \
<pulse duration> \
<pulse frequency> \
<save parameters (y/n)> \
<load parameters (y/n)> \
Any output obtained with the UDE approximation scripts (CSV format) can be used to run the E-SINDy script. It also takes file arguments:
julia --project=. esindy.jl \
<model type (NFB/ERK)> \
<Number of bootstraps> \
<Coefficient threshold> \
<Output filename> \
<List of CSV files> \
This project relies on several important Julia packages for data-driven modeling and differential equations:
-
ModelingToolkit.jl
ModelingToolkit.jl is used for symbolic modeling of dynamical systems. It provides a high-performance, composable modeling framework for automatically parallelized scientific computing. -
Lux.jl
Lux.jl is employed for building neural networks. It offers a flexible and extensible deep learning framework with a focus on compatibility with scientific computing packages. -
DataDrivenDiffEq.jl
DataDrivenDiffEq.jl is central to the data-driven modeling approach of this project. It provides tools for discovering governing equations from data, including implementation of Sparse Identification of Nonlinear Dynamics (SINDy) -
DataDrivenSparse.jl
DataDrivenSparse.jl was used for sparse regression and feature selection for the models. It implements algorithms such as:- STLSQ
- ADAM
- SR3
-
HyperTuning.jl
HyperTuning.jl was used to optimise the hyperparameters of the sparse regression algorithm which play a crucial part to obtain the optimal solution to the problem.
These libraries form the backbone of this data-driven modeling pipeline, enabling to discover and analyse complex dynamical systems from experimental data.
[1] D. Daniels et al. (2015). Chapter 13: Parameter Estimation, Sloppiness, and Model Identifiability in "Quantitative Biology: Theory, Computational Methods and Examples of Models".
[2] Ryu et al. (2015). "Frequency modulation of ERK activation dynamics rewires cell fate". DOI
[3] Rackauckas et al. (2021). "Universal Differential Equations for Scientific Machine Learning". DOI
[4] Brunton et al. (2016). "Discovering governing equations from data by sparse identification of nonlinear dynamical systems". DOI
[5] Fasel et al. (2022). "Ensemble-SINDy: Robust sparse model discovery in the low-data, high-noise limit, with active learning and control". DOI
This project is licensed under the terms of the MIT license.



