Skip to content
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 1 addition & 20 deletions src/diffpy/snmf/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,27 +7,8 @@
Y0 = np.loadtxt("input/W0.txt", dtype=float)
N, M = MM.shape

# Convert to DataFrames for display
# df_X = pd.DataFrame(X, columns=[f"Comp_{i+1}" for i in range(X.shape[1])])
# df_Y = pd.DataFrame(Y, columns=[f"Sample_{i+1}" for i in range(Y.shape[1])])
# df_MM = pd.DataFrame(MM, columns=[f"Sample_{i+1}" for i in range(MM.shape[1])])
# df_Y0 = pd.DataFrame(Y0, columns=[f"Sample_{i+1}" for i in range(Y0.shape[1])])

# Print the matrices
"""
print("Feature Matrix (X):\n", df_X, "\n")
print("Coefficient Matrix (Y):\n", df_Y, "\n")
print("Data Matrix (MM):\n", df_MM, "\n")
print("Initial Guess (Y0):\n", df_Y0, "\n")
"""


my_model = snmf_class.SNMFOptimizer(MM=MM, Y0=Y0, X0=X0, A=A0, components=2)
my_model = snmf_class.SNMFOptimizer(MM=MM, Y0=Y0, X0=X0, A=A0, n_components=2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this, it seems that we have already instantiated some kind of SNMF class, then this is doing the optimization. Is there a particular reason why we make this a class and not a function? It feels much more like a function to me. Could you think what the downsides are of making it a function? Is scikit-learn doing some thing like this too?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What scikit-learn has is an NMF class which instantiates, and then has the standard set of model class members like fit_transform and such. But to actually "do the math", NMF internally makes a call to non_negative_factorization. So the math should probably be a function, but unless (and until) we are literally merging this into scikit-learn, it's worth having a class. But instantiating the class should not be running math as it does currently. That's going to be the next thing I tackle, but this PR is getting a little long so I'm holding off.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's worth having a class

I don't see the value in having this as a class tbh. Can you justify why it is worth having a class for this (when we have already instantiated a higher lever class (I can't easily see in the browser what this class is called) as snmf_class)

print("Done")
# print(f"My final guess for X: {my_model.X}")
# print(f"My final guess for Y: {my_model.Y}")
# print(f"Compare to true X: {X_norm}")
# print(f"Compare to true Y: {Y_norm}")
np.savetxt("my_norm_X.txt", my_model.X, fmt="%.6g", delimiter=" ")
np.savetxt("my_norm_Y.txt", my_model.Y, fmt="%.6g", delimiter=" ")
np.savetxt("my_norm_A.txt", my_model.A, fmt="%.6g", delimiter=" ")
80 changes: 70 additions & 10 deletions src/diffpy/snmf/snmf_class.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,69 @@


class SNMFOptimizer:
def __init__(self, MM, Y0=None, X0=None, A=None, rho=1e12, eta=610, max_iter=500, tol=5e-7, components=None):
print("Initializing SNMF Optimizer")
"""A implementation of stretched NMF (sNMF), including sparse stretched NMF.

Instantiating the SNMFOptimizer class runs all the analysis immediately.
The results matrices can then be accessed as instance attributes
of the class (X, Y, and A).

For more information on sNMF, please reference:
Gu, R., Rakita, Y., Lan, L. et al. Stretched non-negative matrix factorization.
npj Comput Mater 10, 193 (2024). https://doi.org/10.1038/s41524-024-01377-5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would normally do a list of Class attributes here. Everything that is self.something. This is obviously strongly overlapped with the arguments of the constructor, as many of the attributes get defined in the constructor, but logically they are different. Here we list and dsecribe the class attributes, there we describe the init function arguments.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear on how I'd distinguish the arguments from the attributes. I understand how they are different semantically, but what part of that is necessary to make clear here? Can you give an example? Those have been helpful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything that is self.something (except for methods which are self.functions() which are not considered attributes) is an attribute. So MM, Y0, X0 are attributes, but also M, N, rng, num_updates etc.

Inside a function or method the parameters are the arguments of the function. so for the __init__() function they will be MM, Y0, X0, A, rho, eta and so on). Some of the descriptions will overlap but for the function argument the user needs to know if it is optional or not, what the default is, and anything else they need to know to successfully instantiate the class. People will generally not see the two docstrings at the same time, so there can be some repetition, but try and keep it short but informative.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls confirm with a thumbs up if you saw this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added class attributes. Some of them are for sure redundant but a full cleanup I will save for the next PR.

"""

def __init__(
self,
MM,
Y0=None,
X0=None,
A=None,
rho=1e12,
eta=610,
max_iter=500,
tol=5e-7,
n_components=None,
random_state=None,
):
"""Initialize an instance of SNMF and run the optimization

Parameters
----------
MM : ndarray
The data to be decomposed. Shape is (length_of_signal, number_of_conditions).
Y0 : ndarray
The initial guesses for the component weights at each stretching condition.
Shape is (number of components, number ofconditions) Must be provided if
n_components is not provided. Will override n_components if both are provided.
X0 : ndarray
The initial guesses for the intensities of each component per
row/sample/angle. Shape is (length_of_signal, number_of_components).
A : ndarray
The initial guesses for the stretching factor for each component, at each
condition. Shape is (number_of_components, number_of_conditions).
rho : float
The stretching factor that influences the decomposition. Zero corresponds to no
stretching present. Relatively insensitive and typically adjusted in powers of 10.
eta : float
The sparsity factor that influences the decomposition. Should be set to zero for
non-sparse data such as PDF. Can be used to improve results for sparse data such
as XRD, but due to instability, should be used only after first selecting the
best value for rho. Suggested adjustment is by powers of 2.
max_iter : int
The maximum number of times to update each of A, X, and Y before stopping
the optimization.
tol : float
The convergence threshold. This is the minimum fractional improvement in the
objective function to allow without terminating the optimization. Note that
a minimum of 20 updates are run before this parameter is checked.
n_components : int
The number of components to attempt to extract from MM. Note that this will
be overridden by Y0 if that is provided, but must be provided if no Y0 is
provided.
random_state : int
The seed for the initial matrices used in the optimization.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a little unclear. What matrices?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be clearer now.

"""

self.MM = MM
self.X0 = X0
self.Y0 = Y0
Expand All @@ -15,23 +76,22 @@ def __init__(self, MM, Y0=None, X0=None, A=None, rho=1e12, eta=610, max_iter=500
# Capture matrix dimensions
self.N, self.M = MM.shape
self.num_updates = 0
self.rng = np.random.default_rng(random_state)

if Y0 is None:
if components is None:
raise ValueError("Must provide either Y0 or a number of components.")
if n_components is None:
raise ValueError("Must provide either Y0 or n_components.")
else:
self.K = components
self.Y0 = np.random.beta(a=2.5, b=1.5, size=(self.K, self.M)) # This is untested
self.K = n_components
self.Y0 = self.rng.beta(a=2.5, b=1.5, size=(self.K, self.M))
else:
self.K = Y0.shape[0]

# Initialize A, X0 if not provided
if self.A is None:
self.A = np.ones((self.K, self.M)) + np.random.randn(self.K, self.M) * 1e-3 # Small perturbation
self.A = np.ones((self.K, self.M)) + self.rng.normal(0, 1e-3, size=(self.K, self.M))
if self.X0 is None:
self.X0 = np.random.rand(self.N, self.K) # Ensures values in [0,1]
self.X0 = self.rng.random((self.N, self.K))

# Initialize solution matrices to be iterated on
self.X = np.maximum(0, self.X0)
self.Y = np.maximum(0, self.Y0)

Expand Down