Fit surrogate model on existing population #158

schmoelder · 2024-07-11T08:06:15Z

This PR implements a Surrogate class for fitting GPs on existing data from optimizations.

Supersedes #45

To do

Note, there is some WIP in #152 which will also affect this PR. Should we already rebase onto that branch s.t. we can adapt the corresponding interfaces but risk some friction should there be some more changes upstream?

Fit GP on existing population
Provide method to evaluate objective functions, constraint functions etc.
Validate surrogate model accuracy with (simple) test cases
Test if optimizers converge to true solution using surrogate

Open questions

Alternative surrogate modeling approches

Currently, only GPs are implemented. However, other surrogate models can be envisioned (e.g. ANNs). To improve a more modular architecture, we could subclasses a SurrogateBase.

Follow-up projects

Once this is merged, we can also start working on other features which would improve or apply the surrogate models. Eventually, these should be moved to their own issues / PRs but for now, this is just a collection of ideas.

The interface

Currently, the SurrogateModel class somewhat mimics an OptimizationProblem as it also provides methods for estimating objectives, nonlinear constraints etc. To demonstrate this, compare the OptimizationProblem

sequenceDiagram
    User->>+OptimizationProblem: evaluate_objectives(x)
    OptimizationProblem->>+User: f(x)

with the SurrogateModel

sequenceDiagram
    User->>+SurrogateModel: estimate_objectives(x)
    SurrogateModel->>+User: f*(x)

However, it is important to note that the SurrogateModel will never provide all functionality of the OptimizationProblem, such as specifying variables, constraints etc. Hence, it cannot directly be used as an OptimizationProblem e.g. to interface with an Optimizer.

To me, this means we should rethink the architecture and consider what exactly does the SurrogateModel replace? In the context of an OptimizationProblem, I would say, it actually replaces the evaluation toolchain (that which returns the values x->f/g/m/...).

Consequently, we should consider moving the evaluation toolchain from the OptimizationProblem to its own module (which in the process would also make the OptimizationProblem less of a "god class" and would even allow reusing the toolchain in other places) and introduce an EvaluationInterface.

The architecture would then look something like the following:

sequenceDiagram
    User->>+OptimizationProblem: evaluate_objectives(x)
    OptimizationProblem->>+EvaluationInterface: evaluate(x)
    EvaluationInterface->>+OptimizationProblem: f(x)
    OptimizationProblem->>+User: f(x)

Where the EvaluationInterface is then implemented by both the EvaluationPipeline (i.e. the current "toolchain") and the SurrogateModel:

classDiagram
    class OptimizationProblem {
        evaluate_objectives(np.ndarray): np.ndarray
    }
    OptimizationProblem "1" *-- "1" EvaluationInterface
    
    class EvaluationInterface {
        <<interface>>
        +evaluate(np.ndarray): np.ndarray
    }
    
    SurrogateModel <|-- EvaluationInterface
    EvaluationPipeline <|-- EvaluationInterface

Conditioned optimization problems

One of the original ideas for this projects came from optimization problems where we want to fix the value of one of the variables and then run the optimization to find the best point given this value.

For this purpose, we should implement a ConditionedOptimizationProblem which wraps the original OptimizationProblem and provides an interface where the fixed variables are removed. While this is trivial to implement for bound constrained problems, it becomes potentially more complicated for problems with linear and nonlinear constraints.

Plots

For process design, we are often not really interested in just the optimal point but in the general topology of the parameter space. E.g. we are interested in the contours of regions with a given purity. Finely sampling the parameter space would be very expensive so the idea could be to use a surrogate model for this purpose. See also partial dependence plots and #33.

schmoelder · 2024-07-15T13:35:36Z

Notes from Call with @maxsiska

Normalization / StandardScaler

Check if normalization allows for negative values
Check if log-normalizaiton is possible (note, might become an issue for linear constraints)
Check if inverse_transform (when estimating eval_functions) for return_cov is True works independent of used Scaler

schmoelder · 2024-07-15T13:43:26Z

CADETProcess/optimization/surrogate.py

+        X_scaler = StandardScaler().fit(X)
+        Y_scaler = StandardScaler().fit(Y)
+
+        gpr = GaussianProcessRegressor()


Consider allowing specification of hyper parameters via kwargs (with reasonable defaults).

See also: https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.GaussianProcessRegressor.html

Important parameters:

kernel: default is ConstantKernel(1.0, constant_value_bounds="fixed") * RBF(1.0, **length_scale_bounds="fixed") (consider moving to separate method, or expose their parameters in signature)

alpha: Parameter to handle noisy data, default=1e-10

optimizer: Custom optimizer for the kernel’s parameters (could improve performance, e.g. GA)

normalize_y (do we need to inverse transform when evaluating?)

Other suggestions:

Consider check if fit was "OK" (could also be part of some validation method)

Implement heuristic for length scale based on bounds (e.g. length_scale = 0.5 * (ub - lb))

schmoelder · 2024-07-15T14:24:33Z

CADETProcess/optimization/surrogate.py

+                ) -> Any:
+
+            X = np.array(X)
+            X_2d = np.array(X, ndmin=2)


Technically, this does not "ensure 2D" (only ndmin=2). Consequently, we should instead reshape to really ensure 2d.

Also, consider moving function out of class since it duplicates what is done in the OptimizationProblem class.

schmoelder · 2024-07-15T14:28:39Z

tests/test_surrogate.py

+    X_test = surrogate.population.feasible.x[0:2]
+
+    F_test = surrogate.population.feasible.f[0:2]
+    F_est = surrogate.estimate_objectives(X_test)


Here, we need to use a different validation set to test if surrogate does predict well enough.

Co-authored-by: Johannes Schmölder <[email protected]>

This plot could potentially replace the corner plot.

With this commit, also the behavior for deriving alternative solution objects changes. Instead of modifying the original array, the following methods / properties now return a new SolutionIO object: - resample() - normalize() - (anti)derivative - smooth_data()

* Do not round hopsy problem when computing chebyshev center --------- Co-authored-by: r.jaepel <[email protected]>

…mizations

schmoelder commented Jul 15, 2024

View reviewed changes

schmoelder force-pushed the feature/surrogate branch 2 times, most recently from 2759b01 to 5252709 Compare August 14, 2024 11:39

schmoelder force-pushed the dev branch 11 times, most recently from 35e0c67 to d97cf31 Compare December 4, 2024 16:47

schmoelder force-pushed the dev branch from 7b500ea to 1cf3a54 Compare December 16, 2024 14:06

daklauss and others added 12 commits January 8, 2025 08:23

Change unittest to pytest

4f90204

Remove constructors of testclasses for pytest

961edf8

Normalize flow_rate profiles before fitting

eaefbd6

Make run methods private

ae08449

Extend MPMLangmuir gamma range

2706d88

Expose option to set precision when adding variables

ecb6abe

Add option to specify (rounding) precision for transforming variables

b179337

Use variable precision for transformation

5016201

Avoid double transform in _transform

0357f05

Add functionality to round to significant digits.

b415222

Rename TransformBase to TransformerBase

a4e2d46

Remove duplicate code

1ddfd49

ronald-jaepel and others added 26 commits March 26, 2025 08:51

Fix syntax warnings for invalid escape sequence in docstrings

03f98f4

Co-authored-by: Johannes Schmölder <[email protected]>

Add __all__ to optimizationProlem module

5ecbe3a

Use run_simulation method

9b6bea8

Fix NumpyProxyArray in OptimizationProblem

0a30745

Add pairwise plot

500e599

This plot could potentially replace the corner plot.

Add pairwise plot to plot_all of OptimizationResults

d20cc53

Add deprecation warning to corner plot

59f6f2c

Use last point fulfilling purity constraints for initial values

1c5ed95

Update COBYLA parameters in fractionation optimization

282da4a

Formatting

f9e399f

Add version property to Cadet

6823be3

Make ComponentSystem a Typed attribute

6a9d82e

Update decorator

69516f8

Return a copy when extracting solution from simulation results

d46a551

Use super().__init__ in SolutionBulk/IO

6814ff7

Update storage of original reference in DifferenceBase

f7fe705

Fix tests

39ecbb5

Add SolutionFront difference metric

116d7b7

Add option to use_max_slope in ShapeFront

8c3ebf4

Improve docstrings and type annotations

b42404d

Improve plotting methods for SolutionSolid

de8fd1a

Do not round hopsy problem when computing chebyshev center (#256)

70f1a1f

* Do not round hopsy problem when computing chebyshev center --------- Co-authored-by: r.jaepel <[email protected]>

Add Quadratic TestProblem to optimization fixtures

0b0fe10

Add Hypersphere TestProblem to optimization fixtures

fbc4488

Implement Surrogate class for fitting a GP on existing data from opti…

41cf95c

…mizations

schmoelder force-pushed the feature/surrogate branch from 5252709 to 41cf95c Compare April 7, 2025 13:32

schmoelder force-pushed the dev branch 2 times, most recently from 37fe8fb to c51ffb5 Compare May 22, 2025 16:37

schmoelder force-pushed the dev branch from b62ae4b to 1852844 Compare June 10, 2025 13:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fit surrogate model on existing population #158

Fit surrogate model on existing population #158

Uh oh!

schmoelder commented Jul 11, 2024 •

edited

Loading

Uh oh!

schmoelder commented Jul 15, 2024

Uh oh!

schmoelder Jul 15, 2024 •

edited

Loading

Uh oh!

schmoelder Jul 15, 2024

Uh oh!

schmoelder Jul 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Fit surrogate model on existing population #158

Are you sure you want to change the base?

Fit surrogate model on existing population #158

Uh oh!

Conversation

schmoelder commented Jul 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

To do

Open questions

Alternative surrogate modeling approches

Follow-up projects

The interface

Conditioned optimization problems

Plots

Uh oh!

schmoelder commented Jul 15, 2024

Notes from Call with @maxsiska

Normalization / StandardScaler

Uh oh!

schmoelder Jul 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

schmoelder Jul 15, 2024

Choose a reason for hiding this comment

Uh oh!

schmoelder Jul 15, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

schmoelder commented Jul 11, 2024 •

edited

Loading

schmoelder Jul 15, 2024 •

edited

Loading