Skip to content

Expose parameters from DeepEcho PARSynthesizer in SDV (eg. data_types) #1164

@Mohamed209

Description

@Mohamed209

Environment details

  • SDV version:0.17.2
  • Python version:3.9.13
  • Operating System:Windows 10

Question description

I have a dataset where real features seems to follow NegativeBinomial distribution so per the paper

Capture

I want to force the loss during training for some features to use NegativeBinomial distribution

from deepecho.py

        for field in self._output_columns:
            dtype = timeseries_data[field].dtype
            kind = dtype.kind
            if kind in ('i', 'f'):
                data_type = 'continuous'
            elif kind in ('O', 'b'):
                data_type = 'categorical'
            else:
                raise ValueError(f'Unsupported dtype {dtype}')

all feature will be continuous , so while the training

for key, props in self._data_map.items():
            if props['type'] in ['continuous', 'timestamp']:
                mu_idx, sigma_idx, missing_idx = props['indices']
                mu = Y_padded[:, :, mu_idx]
                sigma = torch.nn.functional.softplus(Y_padded[:, :, sigma_idx])
                missing = torch.nn.LogSigmoid()(Y_padded[:, :, missing_idx])

                for i in range(batch_size):
                    dist = torch.distributions.normal.Normal(
                        mu[:seq_len[i], i], sigma[:seq_len[i], i])
                    log_likelihood += torch.sum(dist.log_prob(X_padded[-seq_len[i]:, i, mu_idx]))

                    p_true = X_padded[:seq_len[i], i, missing_idx]
                    p_pred = missing[:seq_len[i], i]
                    log_likelihood += torch.sum(p_true * p_pred)
                    log_likelihood += torch.sum((1.0 - p_true) * torch.log(
                        1.0 - torch.exp(p_pred)))

            elif props['type'] in ['count']:
                r_idx, p_idx, missing_idx = props['indices']
                r = torch.nn.functional.softplus(Y_padded[:, :, r_idx]) * props['range']
                p = torch.sigmoid(Y_padded[:, :, p_idx])
                x = X_padded[:, :, r_idx] * props['range']
                missing = torch.nn.LogSigmoid()(Y_padded[:, :, missing_idx])

                for i in range(batch_size):
                    dist = torch.distributions.negative_binomial.NegativeBinomial(
                        r[:seq_len[i], i], p[:seq_len[i], i], validate_args=False)
                    log_likelihood += torch.sum(dist.log_prob(x[:seq_len[i], i]))

                    p_true = X_padded[:seq_len[i], i, missing_idx]
                    p_pred = missing[:seq_len[i], i]
                    log_likelihood += torch.sum(p_true * p_pred)
                    log_likelihood += torch.sum((1.0 - p_true) * torch.log(
                        1.0 - torch.exp(p_pred)))

all my features will be modeled as gaussian , which is not correct for my case

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions