-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ContinuousApproximator.sample() fails without previous adapter calls (e.g., when loading data) #255
Comments
Based on how I understand what you are doing, I agree with you that this should be differently handled. Just to make sure I understand you correctly, could you add a small example here that (only) includes the relevant code parts? |
I looked further into the issue, as far as I can see it is caused by the OfflineDataset and approximator no longer referring to the same adapter object in memory:
Here is some reduced pseudocode to keep things concise: Simulating at the beginning does not fail: adapter = Adapter()
data = OfflineDataset(simulate(), adapter)
approximator = ContinuousApproximator(summary_net, inference_net, adapter)
approximator.fit(data)
approximator.sample(data) When the data is loaded from an external source (where the adapter was also supplied to OfflineDataset), sampling fails: adapter = Adapter()
data = load_data(path)
approximator = ContinuousApproximator(summary_net, inference_net, adapter)
approximator.fit(data)
approximator.sample(data) Calling the adapter manually before sampling fixes the error: adapter = Adapter()
data = load_data(path)
approximator = ContinuousApproximator(summary_net, inference_net, adapter)
approximator.fit(data)
_ = adapter(data)
approximator.sample(data) Creating data manually before sampling does not fix it (i.e., simply creating an OfflineDataset) since the adapter is not called during OfflineDataset construction: adapter = Adapter()
data = load_data(path)
approximator = ContinuousApproximator(summary_net, inference_net, adapter)
approximator.fit(data)
data_2 = OfflineDataset(simulate(), adapter)
approximator.sample(data_2) |
Thank you! This is very helpful! @LarsKue and @stefanradev93 what are your takes on how to fix this? |
Indeed, when passing OfflineDataset.adapter to the approximator, the error is gone (so it is not really a bug but more of an unexpected behavior). But this is a rather unintuitive solution for users that should not be required. data = load_data(path)
approximator = ContinuousApproximator(summary_net, inference_net, data.adapter)
approximator.fit(data)
approximator.sample(data) |
It will appear to users as a bug because it should just work. In any case, we should fix it before 2.0 release. |
Could be faulty serialization in the Adapter. I will investigate next week. |
@LarsKue Is this issue fixed already? |
Thanks for the bump. I fail to see the issue, or it is not reproducible for me. Consider the following working snippet: import os
os.environ["KERAS_BACKEND"] = "torch"
import bayesflow as bf
import keras
import numpy as np
data = {
"x": np.random.standard_normal(size=(32, 2)),
"theta": np.random.standard_normal(size=(32, 2)),
}
adapter = bf.Adapter()
adapter.to_array()
adapter.rename("x", "inference_variables")
adapter.rename("theta", "inference_conditions")
dataset = bf.OfflineDataset(data, batch_size=2, adapter=adapter)
inference_network = bf.networks.FlowMatching()
approximator = bf.ContinuousApproximator(adapter=adapter, inference_network=inference_network)
approximator.compile(optimizer="adam")
approximator.build_from_data(
keras.tree.map_structure(keras.ops.convert_to_tensor, dataset[0])
)
# optional: approximator.fit(...)
conditions = {"inference_conditions": dataset[0]["inference_conditions"]}
samples = approximator.sample(num_samples=32, conditions=conditions)
approximator.save("m.keras")
# later:
approximator = keras.saving.load_model("m.keras")
# generate new data
conditions = {
"theta": np.random.standard_normal(size=(32, 2)),
}
# uses the existing adapter under the hood
samples = approximator.sample(num_samples=32, conditions=conditions) |
I will close this issue for now @elseml feel free to reopen it if it occurs again. |
Thanks for looking into this. The issue does not relate to model loading but data loading situations and will be relevant for offline training workflows. Here, a standard approach would be to simulate some data with script A and train a network with it in a separate script B. As I wrote above, the issue is not a technical bug but rather an unexpected behavior: Intuitively, users might define the same adapter at the start of each script. Then, the adapter in script B is first called when sampling, raising the @paul-buerkner could you reopen the issue? Your snippet can be modified as follows to reproduce the error: Script A: import os
os.environ["KERAS_BACKEND"] = "torch"
import bayesflow as bf
import numpy as np
import pickle
data = {
"x": np.random.standard_normal(size=(32, 2)),
"theta": np.random.standard_normal(size=(32, 2)),
}
adapter = bf.Adapter()
adapter.to_array()
adapter.rename("x", "inference_variables")
adapter.rename("theta", "inference_conditions")
dataset = bf.OfflineDataset(data, batch_size=2, adapter=adapter)
with open("test_dataset.pkl", "wb") as f:
pickle.dump(dataset, f) Script B: import os
os.environ["KERAS_BACKEND"] = "torch"
import bayesflow as bf
import keras
import numpy as np
import pickle
with open("test_dataset.pkl", "rb") as f:
dataset = pickle.load(f)
inference_network = bf.networks.FlowMatching()
adapter = bf.Adapter()
adapter.to_array()
adapter.rename("x", "inference_variables")
adapter.rename("theta", "inference_conditions")
approximator = bf.ContinuousApproximator(adapter=adapter, inference_network=inference_network)
approximator.compile(optimizer="adam")
approximator.build_from_data(
keras.tree.map_structure(keras.ops.convert_to_tensor, dataset[0])
)
# optional: approximator.fit(...)
conditions = {"inference_conditions": dataset[0]["inference_conditions"]}
samples = approximator.sample(num_samples=32, conditions=conditions)
approximator.save("m.keras")
# later:
approximator = keras.saving.load_model("m.keras")
# generate new data
conditions = {
"theta": np.random.standard_normal(size=(32, 2)),
}
# uses the existing adapter under the hood
samples = approximator.sample(num_samples=32, conditions=conditions) This code raises a RuntimeError: Cannot call `inverse` before calling `forward` at least once. The specific ValueError reported above occured for an adapter using the common ValueError: Cannot call `forward` with `strict=False` before calling `forward` with `strict=True`. |
@elseml Why not save your raw data to file rather than wrapping in |
I agree that creating the |
@elseml Yes, I think this would be a good addition. We could even hint at the intended use of |
I noticed that after switching from generating
bf.datasets
on-the-fly to loading pre-simulated data,ContinuousApproximator.sample()
fails since the adapter is not called before sampling anymore. Concretely, in line 141 ofcontinuous_approximator.py
, the adapter is called withstrict=False
to process the observed data (and not require parameter keys while doing so):This raises the following error in the adapters
forward()
method when working with loaded data:"ValueError: Cannot call `forward` with `strict=False` before calling `forward` with `strict=True`.".
The error is easily fixed by manually calling the adapter on the data before sampling, but of course unexpected for the user and should therefore be handled internally.
@LarsKue @stefanradev93: what do you think would be a principled handling of this behavior?
The text was updated successfully, but these errors were encountered: