Skip to content

WIP: ENH,TST: partial fit implementation with tests#47

Open
eiviani-lanl wants to merge 15 commits intomainfrom
eiviani_partial_fit
Open

WIP: ENH,TST: partial fit implementation with tests#47
eiviani-lanl wants to merge 15 commits intomainfrom
eiviani_partial_fit

Conversation

@eiviani-lanl
Copy link
Collaborator

partial_fit() using normal equations

Moore Penrose Pseudoinverse:
D+ = (D.T @ D)^-1 @ D.T

Least squares solution:
D+ @ y = (D.T @ D)^-1 @ D.T @ y

We're persisting the gram (D.T @ D) and moment (D.T @ y) matrices and updating them by adding the gram and moment matrices of each consecutive batch.

Update rule:
Consider the summation representation of matrix multiplication:
(B.T @ B)_ij = sum_k (B.T_ik @ B_kj)

Now suppose the full design matrix is formed by vertically stacking
two batches, written as [B_1 | B_2]:
[B_1 | B_2].T @ [B_1 | B_2] = B_1.T @ B_1 + B_2.T @ B_2

@tylerjereddy tylerjereddy added the enhancement New feature or request label Feb 5, 2026
@tylerjereddy tylerjereddy added this to the 0.2.0 milestone Feb 5, 2026
@tylerjereddy
Copy link
Collaborator

Thanks, I've added the 0.2.0 release milestone since trying to merge this right before branching for a release would probably be ill-advised.

@tylerjereddy
Copy link
Collaborator

I just resolved the merge conflicts and force pushed up. Let me see if I can at least take a quick look over.

@pytest.mark.parametrize("hidden_layer_sizes", [(10,), (5, 5)])
@pytest.mark.parametrize("direct_links", [True, False])
@pytest.mark.parametrize("activation", activations)
@pytest.mark.parametrize("weight_scheme", weights[1:])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI error is happening because of changes that happened during the sklearn PR review that didn't happen here.

Playing around a bit locally, the diff below the fold allows both ruff and the pytest tests to pass, but do check my work to make sure it is sensible:

Details
--- a/src/gfdl/tests/test_regression.py
+++ b/src/gfdl/tests/test_regression.py
@@ -7,7 +7,9 @@ from sklearn.model_selection import train_test_split
 from sklearn.preprocessing import StandardScaler
 from sklearn.utils.estimator_checks import parametrize_with_checks
 
+from gfdl.activations import ACTIVATIONS
 from gfdl.model import GFDLRegressor
+from gfdl.weights import WEIGHTS
 
 
 @pytest.mark.parametrize("""n_samples,
@@ -145,8 +147,8 @@ def test_regression_boston(reg_alpha, expected):
 
 @pytest.mark.parametrize("hidden_layer_sizes", [(10,), (5, 5)])
 @pytest.mark.parametrize("direct_links", [True, False])
-@pytest.mark.parametrize("activation", activations)
-@pytest.mark.parametrize("weight_scheme", weights[1:])
+@pytest.mark.parametrize("activation", ACTIVATIONS.keys())
+@pytest.mark.parametrize("weight_scheme", list(WEIGHTS.keys())[1:])
 @pytest.mark.parametrize("alpha", [None, 0.1, 0.5, 1])
 def test_partial_fit_regressor(

@pytest.mark.parametrize("direct_links", [True, False])
@pytest.mark.parametrize("n_classes", [2, 5])
@pytest.mark.parametrize("activation", activations)
@pytest.mark.parametrize("weight_init", weights[1:])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere, I think you can remove the redefinition/duplication of activations and weights variables in the testing module, and just import them from the dictionary keys having the same information available in activations.py and weights.py, respectively.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(that may make sense as a separate cleanup PR though; I believe this also came up in the sklearn-focused review)

else:
pf_model.partial_fit(Xb, yb)

assert_allclose(ff_model.coeff_, pf_model.coeff_, rtol=1e-5, atol=1e-5)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since assert_allclose is documented to check actual, expected in that order, here and elsewhere we should probably flip the order so that if errors show up, the traceback correctly shows "actual" as the partial fit value under test that is not meeting expectations.

ff_model.fit(X, y)

classes = np.unique(y)
batch = 25
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that if I change the batch size to 17 locally, one of the Moore-Penrose cases ends up with a numerical issue that causes a failure.

That may be related to the requirement for well-conditioned matrices you note above. We'll need to be clear on what the limitations/weaknesses with partial_fit are, and double check that they are things we cannot improve. One useful exercise might be to check this test with i.e., MLPClassifier to see if its partial_fit is a bit more robust in this regard.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and of course, to assess if using rtol + partial_fit actually allows us to overcome many of these issues, or not)

pf_model.partial_fit(Xb, yb)

for ff_c, pf_c in zip(ff_model.coeffs_, pf_model.coeffs_, strict=False):
assert_allclose(ff_c, pf_c, rtol=1e-5, atol=4e-4)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't reproduce the need to manually iterate over the objects here--both coeffs_ objects always seem to be compatiably-shaped array-like objects, so it is likely faster and more informative to just check the full breadth of values directly:

-    for ff_c, pf_c in zip(ff_model.coeffs_, pf_model.coeffs_, strict=False):
-        assert_allclose(ff_c, pf_c, rtol=1e-5, atol=4e-4)
+    assert_allclose(ff_model.coeffs_, pf_model.coeffs_, rtol=1e-5, atol=4e-4)

and, of course, probably best to reverse the order so actual is the thing under test (partial fitting)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, with that change, I believe this test could be combined with test_partial_fit_classifier above by placing the estimator object under parametrization as well, which I suppose might reduce duplication/maintenance burden here a bit.

# NOTE: for Moore-Penrose, a large singular value
# cutoff (rcond) is required to achieve reasonable accuracy
# with the Wisconsin breast cancer dataset
# Without rtol accuracy ~= 0.6316
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, that's good to know. Does this apply only for partial_fit() on that dataset, or to full fit() as well? Based on this, I wonder if it might be beneficial to:

  1. Perhaps add a Notes docstring section to the partial_fit() function(s) as appropriate to concisely express the potential importance of setting rtol appropriately? I can't remember if we already do that for fit() proper, which of course also has the same issue with Moore-Penrose.
  2. Perhaps also include some of your math details described in detail in a few places to the Notes section of the docstring for partial_fit(). I believe you currently have that info as a detailed code comment that an end user wouldn't see without looking at our source. Not sure if we want quite that much detail, but maybe useful to describe at least the core algorithm used.

def test_rtol_partial_fit(Classifier, ridge_alpha, expected):
X, y = load_breast_cancer(return_X_y=True)

X = StandardScaler().fit_transform(X)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just delete this since scaling is happening again below. Tests pass locally when I do that.

I suppose you could also save a line below by using fit_transform(X_train).


actual = model.score(X_test, y_test)
# RandomForestRegressor() with default params scores 0.958 here
# RVFL with above params scores comparatively:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

# Order-invariance test for ensemble partial_fit
# Unfortunately because ensembleGFDL has its' own partial_fit()
# implementation and the difference in comparing coeffs we need
# to repeat a lot of tests
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I was able to fuse them in a pretty straightforward way by using getattr to compensate for plural vs. singular coeff_ and the fact that assert_allclose() accepts array-likes does the rest of the work. With the diff below the fold, the testsuite passes, runs the same number of tests as before, and saves a decent number of repeated lines.

Details
--- a/src/gfdl/tests/test_model.py
+++ b/src/gfdl/tests/test_model.py
@@ -636,7 +636,6 @@ def test_partial_fit_ensemble(
 def test_rtol_partial_fit(Classifier, ridge_alpha, expected):
     X, y = load_breast_cancer(return_X_y=True)
 
-    X = StandardScaler().fit_transform(X)
     X_train, X_test, y_train, y_test = train_test_split(
         X, y, test_size=0.2, random_state=42, shuffle=True
     )
@@ -688,7 +687,10 @@ def test_partial_fit_classes_error(Classifier):
         clf.partial_fit(X[25:], y_bad)
 
 
-def test_batch_order_invariance():
+@pytest.mark.parametrize("estimator, attr", [
+    (GFDLClassifier, "coeff_"),
+    (EnsembleGFDLClassifier, "coeffs_")])
+def test_batch_order_invariance(estimator, attr):
     # Order-invariance test for classifier partial_fit
     X, y = make_classification(
         n_samples=400,
@@ -699,7 +701,7 @@ def test_batch_order_invariance():
         random_state=0,
     )
 
-    ff_model = GFDLClassifier(seed=0, reg_alpha=0.1)
+    ff_model = estimator(seed=0, reg_alpha=0.1)
     pf_model = clone(ff_model)
 
     ff_model.fit(X, y)
@@ -717,44 +719,8 @@ def test_batch_order_invariance():
         else:
             pf_model.partial_fit(Xb, yb)
 
-    assert_allclose(ff_model.coeff_, pf_model.coeff_, rtol=4e-8, atol=2e-10)
-
-
-def test_batch_order_invariance_ensemble():
-    # Order-invariance test for ensemble partial_fit
-    # Unfortunately because ensembleGFDL has its' own partial_fit()
-    # implementation and the difference in comparing coeffs we need
-    # to repeat a lot of tests
-    X, y = make_classification(
-        n_samples=400,
...skipping...
@@ -717,44 +719,8 @@ def test_batch_order_invariance():
         else:
             pf_model.partial_fit(Xb, yb)
 
-    assert_allclose(ff_model.coeff_, pf_model.coeff_, rtol=4e-8, atol=2e-10)
-
-
-def test_batch_order_invariance_ensemble():
-    # Order-invariance test for ensemble partial_fit
-    # Unfortunately because ensembleGFDL has its' own partial_fit()
-    # implementation and the difference in comparing coeffs we need
-    # to repeat a lot of tests
-    X, y = make_classification(
-        n_samples=400,
-        n_features=20,
-        n_informative=10,
-        n_redundant=5,
-        n_classes=2,
-        random_state=0,
-    )
-
-    ff_model = EnsembleGFDLClassifier(seed=0, reg_alpha=0.1)
-    pf_model = clone(ff_model)
-
-    ff_model.fit(X, y)
-
-    classes = np.unique(y)
-    batch = 25
-    rng = np.random.default_rng(0)
-    starts = rng.permutation(np.arange(0, len(X), batch))
-
-    for j, start in enumerate(starts):
-        end = min(start + batch, len(X))
-        Xb = X[start:end]
-        yb = y[start:end]
-        if j == 0:
-            pf_model.partial_fit(Xb, yb, classes=classes)
-        else:
-            pf_model.partial_fit(Xb, yb)
-
-    for ff_c, pf_c in zip(ff_model.coeffs_, pf_model.coeffs_, strict=False):
-        assert_allclose(ff_c, pf_c, rtol=4e-8, atol=2e-10)
+    assert_allclose(getattr(pf_model, attr),
+                    getattr(ff_model, attr), rtol=4e-8, atol=2e-10)
 

pf2.partial_fit(X[s:e], y[s:e], classes=classes)

for pf1_c, pf2_c in zip(pf1.coeffs_, pf2.coeffs_, strict=False):
assert_allclose(pf1_c, pf2_c, rtol=4e-8, atol=2e-10)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect you can use a similar strategy to fuse this test with the one above (based on my recent comments). I'd say it would be just fine to use the looser of the two tolerances when fusing.


# this should fail
for ff_c, pf_c in zip(ff_model.coeffs_, pf_model.coeffs_, strict=False):
assert_allclose(ff_c, pf_c, rtol=1e-2, atol=1e-2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect you could also fuse with test above using the approaches mentioned elsewhere in the review (you can xfail on parametrized cases too).

# through summation

if self.reg_alpha is not None and self.reg_alpha < 0.0:
raise ValueError("Negative reg_alpha. Expected range : None or [0.0, inf).")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't mind adding a test for the error handling here, that would be helpful, this line isn't hit yet:

Image

def partial_fit(self, X, Y):

if self.reg_alpha is not None and self.reg_alpha < 0.0:
raise ValueError("Negative reg_alpha. Expected range : None or [0.0, inf).")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here:

Image

reg = np.identity(self.As[i].shape[0]) * self.reg_alpha
coef_ = np.linalg.solve(self.As[i] + reg, self.Bs[i])
if coef_.ndim == 2 and coef_.shape[1] == 1:
coef_ = coef_.ravel()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This special case isn't hit by testing yet--if it is important to preserve we should try to hit it with a test case:

Image

@tylerjereddy
Copy link
Collaborator

Mostly a note to self--I went through test_model.py carefully today--I should check the other two modules when I get a bit more time.

What I've seen so far seems positive.

yb = y[start:end]
pf_model.partial_fit(Xb, yb)

np.testing.assert_allclose(ff_model.coeff_, pf_model.coeff_, rtol=1e-5, atol=4e-5)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usual comment about "actual" value (pf_model.coeff_) coming first applies here as well.

A more substantial comment is that test_regression.py currently doesn't really check for the model performance with partial_fit, so that might be good to do on one of the easy-to-load datasets. I believe you did add some performance testing for partial_fit for the classifiers above with the breast cancer dataset, so something equivalent for regression may be nice.

That said, I think it can be pretty simple--maybe not necessary to be quite as thorough on the regression as on the classification here since I believe both will ultimately flush similar codepaths in the end.

# [B_1 | B_2].T @ [B_1 | B_2] = B_1.T @ B_1 + B_2.T @ B_2
#
# It's evident that it is indeed possible to update normal equations
# through summation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I mentioned this elsewhere, but it may very well be worth documenting this in a user-facing way (i.e., in a docstring Notes section) for the appropriate partial_fit methods (probably not much value on the base class, but documenting on the user-consumed classes may have value).

While you're at it, documenting some concise guidance on the cases where the user might consider tuning rtol when using partial_fit may also have value.

self.b_.append(
self._weight_mode(1, hidden_layer_sizes[0], rng=rng)
.reshape(-1)
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first-call initialization above appears to be shared with EnsembleGFDL partial_fit()--perhaps we could avoid the code duplication by abstraction.

# or (n_samples, sum_hidden)
if self.direct_links:
Hs.append(X)
D = np.hstack(Hs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code block, starting from the Hs = [] above is starting to look very familiar. It seems to be identical in the fit() method of the GFDL base class. There may be some opportunity to reduce code duplication.

if self.rtol is None:
self.coeff_ = np.linalg.pinv(self.A) @ self.B
else:
self.coeff_ = np.linalg.pinv(self.A, rcond=self.rtol) @ self.B
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you check other parts of the codebase with git grep -E -i "self\.rtol", you'll see that we're using the newer array API conformant version of the argument via rtol=self.rtol. That's probably preferable moving forward, with a default value of rtol=None using a community standard value when not specified. If I simplify accordingly the tests still pass locally:

--- a/src/gfdl/model.py
+++ b/src/gfdl/model.py
@@ -195,10 +195,7 @@ class GFDL(BaseEstimator):
         # MoorePenrose Pseudo-Inverse, otherwise use ridge regularized form.
 
         if self.reg_alpha is None:
-            if self.rtol is None:
-                self.coeff_ = np.linalg.pinv(self.A) @ self.B
-            else:
-                self.coeff_ = np.linalg.pinv(self.A, rcond=self.rtol) @ self.B
+                self.coeff_ = np.linalg.pinv(self.A, rtol=self.rtol) @ self.B

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and remove one level of indentation)

self.coeff_ = np.linalg.pinv(self.A, rcond=self.rtol) @ self.B
else:
reg = np.identity(self.A.shape[0]) * self.reg_alpha
self.coeff_ = np.linalg.solve(self.A + reg, self.B)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean it is out of the question to use sklearn Ridge as we do for regular fit()? Does that have any consequences? In principle, we'd have less flexibility in the solver with partial_fit() this way, right?

# (n_hidden,)
self.b_.append(
self._weight_mode(1, layer, rng=rng,).reshape(-1)
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code above looks identical to what is used for fit() above--perhaps consider abstracting it to avoid duplication/maintenance burden

if self.rtol is None:
coef_ = np.linalg.pinv(self.As[i]) @ self.Bs[i]
else:
coef_ = np.linalg.pinv(self.As[i], rcond=self.rtol) @ self.Bs[i]
Copy link
Collaborator

@tylerjereddy tylerjereddy Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar comment here -- can use rtol argument of pinv to match array API standard, and that removes the need to check if self.rtol is None

coef_ = np.linalg.pinv(self.As[i], rcond=self.rtol) @ self.Bs[i]
else:
reg = np.identity(self.As[i].shape[0]) * self.reg_alpha
coef_ = np.linalg.solve(self.As[i] + reg, self.Bs[i])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose similar comment here re: situation with flexibility of this approach vs. all the solvers we have access to via Ridge with full fit().


# call base fit method
super().partial_fit(X, Y)
return self
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is identical to the partial_fit() of GFDLClassifier above? Potential to reduce code duplication?

object
Returns the partially fitted estimator.
"""
# shape: (n_samples, n_features)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: this comment doesn't have much value anymore

@tylerjereddy tylerjereddy changed the title ENH,TST: partial fit implementation with tests WIP: ENH,TST: partial fit implementation with tests Mar 4, 2026
@tylerjereddy
Copy link
Collaborator

I'll add a "WIP" (work in progress) to the title here since some merges aren't done correctly yet (i.e., old dependencies are added back accidentally, etc.).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants