-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Categorical features along side continuous features? #3
Comments
Hi @fountaindive, thanks for your interest in the categorical normalising flows! |
Hi @phlippe, ah great yeah that makes perfect sense. I am definitely going to try this now! |
Hi @phlippe, I was wondering if you would be interesting in helping me put together a minimum working example how modelling a toy dataset with continuous and categorical features? I'm happy to add it to the repo as an example notebook. I'm toying around with 2 continuous features and a single categorical feature but I'm not really sure how to use your library 😅 many thanks either way :) |
Hi @fountaindive, sure that would be great! Let me summarize the important modules/classes and steps needed:
Hope that helps! Let me know if you have any questions or face any issues :) |
Hi @phlippe, thank you very much for your detailed notes, I really appreciate it! I'm trying a slightly simpler case first which I'll use to expand from. Suppose my dataset is just two columns of continuous features. I'm trying to build a "Tabular" flow model class to model this. I think I've got most of the code written but there is something wrong and perhaps you can help? The logic is as follows: I'd like to model some data I think the simplest flow model would have the following layers
I'm using the For the Coupling Network I use a small dense network with 1 input and 2 outputs. 1 input because we have 2 input features but we will mask 50% of them with Currently I'm getting a shape error import sys
sys.path.append("path_to_directory/CategoricalNF")
import numpy as np
import torch
import torch.nn as nn
from layers.flows.flow_model import FlowModel
from layers.flows.permutation_layers import InvertibleConv
from layers.flows.coupling_layer import CouplingLayer
class CouplingNetwork(nn.Module):
def __init__(self, c_in, c_out, hidden_size):
"""
this neural network models the shift and scale parameters
of the coupling layer
"""
super().__init__()
self.model = nn.Sequential(
nn.Linear(c_in, hidden_size),
nn.ReLU(),
nn.Linear(hidden_size, hidden_size),
nn.ReLU(),
nn.Linear(hidden_size, c_out),
)
def forward(self, x):
return self.model(x)
class FlowTabularModelling(FlowModel):
def __init__(self):
super().__init__(layers=None, name="Tabular")
self._create_layers()
self.print_overview()
def _create_layers(self):
"""
I want the simplest model possible
input: 2 features
permute
coupling
permute
coupling
output: 2 features
"""
n_dim = 2
c_in = 1 # we only have one input to the CouplingNetwork because we only have 2 features and we mask 50%
c_out = 2 # two parameters: 1 for the shift and 1 for the scale in the coupling flow?
model_func = lambda c_out: CouplingNetwork(c_in=c_in, c_out=c_out, hidden_size=128)
# Will mask half the features at a time?
coupling_mask = CouplingLayer.create_channel_mask(n_dim)
layers = [
InvertibleConv(n_dim),
CouplingLayer(n_dim, coupling_mask, model_func, c_out=c_out),
InvertibleConv(n_dim),
CouplingLayer(n_dim, coupling_mask, model_func, c_out=c_out),
]
self.flow_layers = nn.ModuleList(layers)
def forward(self, z, ldj=None, reverse=False, length=None):
return super().forward(z, ldj=ldj, reverse=reverse, length=length) And here is some fake take to test the forward pass x = np.random.uniform(size=(10, 1))
x = x.astype(np.float32)
x = torch.from_numpy(x)
ftm = FlowTabularModelling()
ftm.forward(x) Which is currently giving the following shape error
Hopefully that makes sense! I'm still new to normalising flows so might have gotten some terminology wrong. Thanks again! |
Hi,
Really interesting work on categorical normalising flows (CNF), I'm reading your paper now.
I'm interesting in applying normalising flows to generic tabular datasets that can have both continuous and categorical features. Is it possible to combine CNF with standard normalising flows to cater for generic tabular datasets?
Many thanks!
The text was updated successfully, but these errors were encountered: