Skip to content

The goal of this project is to gain data-driven insights into signaling mechanisms using recent machine learning tools such as UDEs (ODE with neural network approximator) and SINDy-based equation discovery.

License

Notifications You must be signed in to change notification settings

girochat/DataDrivenEquationDiscovery

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Driven Equation Discovery for Biological Signalling Networks

Project description

This project aims to estimate unknown parts of an ordinary differential equation (ODE) system in the context of biological signalling networks. The project can be divided into two central parts with their own specific methodology:

  1. approximating the unknown part(s) of the ODE system using a neural network (UDE approximation).
  2. estimating the formula of the resulting unknown function(s) (equation discovery)

To test the approach, two biological models were used to simulate data and build a UDE system out of their respective ODE system:

  • the Negative FeedBack (NFB) model presented in Chapter 13: Parameter Estimation, Sloppiness, and Model Identifiability by D. Daniels, M. Dobrzyński, D. Fey in "Quantitative Biology: Theory, Computational Methods and Examples of Models" (2018).
  • the ERK signalling (ERK) model presented by Ryu et al. in "Frequency modulation of ERK activation dynamics rewires cell fate" (2015) DOI.

The goal of the method would typically be to identify some feedback mechanisms that are part of the dynamics using a measurable variable of the system. Here are the schematics of the two models where g2p, resp. ERK are the observed variables:

models_schematics

For the first part, a UDE model, i.e. an ODE model with unknown part(s) that are modelled by a neural network, is fitted to real or simulated data. Here is an example with the original ODE system and updated UDE system of the NFB model introduced above:
 

nfb_ode_ude

 

The parameters of the neural network are optimised while solving the ODE problem. The approach used is largely inspired from the tutorial "Automatically Discover Missing Physics by Embedding Machine Learning into Differential Equations" of the SciML ecosystem. Here is a schematic of the optimisation process:

ude_optimisation

To better illustrate the process, here is an animation of the UDE optimisation for the NFB model in the negative feedback case "ab" with an 0.05M input stimulation. The left panel shows the ODE solution at the given iteration; the middle panel shows the data fitting of the second kinase (observed variable); the right panel shows the neural network output.

 

nfb_animation.mp4

 

For the second part, the function(s) represented by the neural network are fed into a SINDy-based algorithm to estimate the mathematical formula [4],[5]. The algorithm uses sparse regression to select the optimal terms and their coefficients ($\Xi$) among a library of simple functions ( $\Theta (X)$ ) to build the equation formula. Follow a schematic of SINDy algorithm where $Y=(Y_1, Y_2..Y_k)$ is the input data for which the equation is to be retrieved:

ude_optimisation

The purpose is to gain insights about the signalling mechanisms in a data-driven manner while incorporating some prior knowledge about the system.

Project Structure

Code

The workflow of the project is structured into two consecutive parts:

  1. UDE approximation: the UDE approximation scripts will output the neural network estimation given a specified UDE model. In this project, UDE systems for the NFB model and the ERK model were used. The data simulation as well as UDE optimisation results are saved in CSV format.
  2. Equation discovery (E-SINDy): the E-SINDy scripts take as input the previously generated UDE results and output the optimal equation formula. The output contains also various statistics about the E-SINDy run and is saved in JDL2 format.

For each case interactive Pluto/Jupyter notebooks and Julia scripts are provided, for easier readability on one hand and automation on the other.

Folders

This repository contains three main directories:

  • Code:
    Pluto/Jupyter notebooks and Julia scripts.

    • UDE approximation:
      Pluto notebook: <model_name>_ude_approximation_pluto.jl
      Jupyter notebook: <model_name>_ude_approximation.ipynb
      Julia script: <model_name>_ude_approximation.jl
    • E-SINDy:
      Pluto notebook: <model_name>_esindy_pluto.jl
      Jupyter notebook: <model_name>_esindy.ipynb
      Julia script: esindy.jl and esindy_utils.jl
      Note: the Julia script esindy_utils.jl contains all the utility functions that are used to run the equation discovery part, while the script esindy.jl allows to run the algorithm.
  • Data:
    Final and intermediate results (CSV and JDL2 files) that were obtained for the two considered ODE systems, i.e. NFB and ERK model.

  • Plots:
    All plots of the results.

Installation

Install Julia

To run the code in this repository, you need to have Julia installed which can be downloaded at https://julialang.org/downloads/. Please note that Julia Version 1.10.4 was used as referenced in the Manifest.toml file.

Clone the Repository

Clone this repository to your local machine.

  git clone https://github.com/girochat/DataDrivenEquationDiscovery.git

Install Dependencies

For reproducibility, it is recommended to use the directory of the project as a Julia environment:

  1. Go to the Code directory of the repository:
    cd /your_path_to_the_repo/DataDrivenEquationDiscovery/Code
  2. Open Julia and open the REPL by pressing ']'
  3. In the REPL, activate the local environment and instantiate the packages:
    pkg> activate .
    pkg> instantiate

Usage

Notebooks

Pluto.jl is an interactive notebook environment for Julia similar to Jupyter for Python. It provides interactivity for running the UDE approximation and E-SINDy and visualising the results. To open the notebook, follow these steps:
Start the Pluto notebook in Julia:

using Pluto
Pluto.run()

In the Pluto interface, open the desired notebook file to start exploring the method and visualising the results.

If you prefer using Jupyter Notebook, the Jupyter version of the Pluto notebooks are also provided.

Julia Scripts

1. Run UDE approximation script

The UDE scripts are specific to the ODE model considered and take some file argument. Make sure to be in the Code/ folder before running the command or replace the dot in --project=. by the relative/absolute path to the Code/ directory.

 

To run the NFB model UDE approximation script:

julia --project=. NFB_ude_approximation.jl \
<NFB model type (no/a/b/ab)> \
<input concentration> \
<save parameters (y/n)> \
<load parameters (y/n)> \

To run the ERK model UDE approximation script:

julia --project=. ERK_ude_approximation.jl \
<Growth factor type (EGF/NGF)> \ 
<Type of input concentration (high/low)> \
<pulse duration> \
<pulse frequency> \
<save parameters (y/n)> \
<load parameters (y/n)> \
2. Run E-SINDy script

Any output obtained with the UDE approximation scripts (CSV format) can be used to run the E-SINDy script. It also takes file arguments:

julia --project=. esindy.jl \
<model type (NFB/ERK)> \
<Number of bootstraps> \
<Coefficient threshold> \
<Output filename> \
<List of CSV files> \

Dependencies and Key Libraries

This project relies on several important Julia packages for data-driven modeling and differential equations:

  • ModelingToolkit.jl
    ModelingToolkit.jl is used for symbolic modeling of dynamical systems. It provides a high-performance, composable modeling framework for automatically parallelized scientific computing.

  • Lux.jl
    Lux.jl is employed for building neural networks. It offers a flexible and extensible deep learning framework with a focus on compatibility with scientific computing packages.

  • DataDrivenDiffEq.jl
    DataDrivenDiffEq.jl is central to the data-driven modeling approach of this project. It provides tools for discovering governing equations from data, including implementation of Sparse Identification of Nonlinear Dynamics (SINDy)

  • DataDrivenSparse.jl
    DataDrivenSparse.jl was used for sparse regression and feature selection for the models. It implements algorithms such as:

    • STLSQ
    • ADAM
    • SR3
  • HyperTuning.jl
    HyperTuning.jl was used to optimise the hyperparameters of the sparse regression algorithm which play a crucial part to obtain the optimal solution to the problem.

These libraries form the backbone of this data-driven modeling pipeline, enabling to discover and analyse complex dynamical systems from experimental data.

Additional Resources and References

[1] D. Daniels et al. (2015). Chapter 13: Parameter Estimation, Sloppiness, and Model Identifiability in "Quantitative Biology: Theory, Computational Methods and Examples of Models".
[2] Ryu et al. (2015). "Frequency modulation of ERK activation dynamics rewires cell fate". DOI
[3] Rackauckas et al. (2021). "Universal Differential Equations for Scientific Machine Learning". DOI
[4] Brunton et al. (2016). "Discovering governing equations from data by sparse identification of nonlinear dynamical systems". DOI
[5] Fasel et al. (2022). "Ensemble-SINDy: Robust sparse model discovery in the low-data, high-noise limit, with active learning and control". DOI

License

This project is licensed under the terms of the MIT license.

About

The goal of this project is to gain data-driven insights into signaling mechanisms using recent machine learning tools such as UDEs (ODE with neural network approximator) and SINDy-based equation discovery.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published