Skip to content

Commit f80b6e9

Browse files
committed
make release-tag: Merge branch 'main' into stable
2 parents 3a9a307 + 844e223 commit f80b6e9

File tree

19 files changed

+464
-259
lines changed

19 files changed

+464
-259
lines changed
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
name: Dependency Checker
2+
on:
3+
schedule:
4+
- cron: '0 0 * * 1-5'
5+
workflow_dispatch:
6+
jobs:
7+
build:
8+
runs-on: ubuntu-latest
9+
steps:
10+
- uses: actions/checkout@v3
11+
- name: Set up Python 3.9
12+
uses: actions/setup-python@v4
13+
with:
14+
python-version: 3.9
15+
- name: Install dependencies
16+
run: |
17+
python -m pip install .[dev]
18+
make check-deps OUTPUT_FILEPATH=latest_requirements.txt
19+
- name: Create pull request
20+
id: cpr
21+
uses: peter-evans/create-pull-request@v4
22+
with:
23+
token: ${{ secrets.GH_ACCESS_TOKEN }}
24+
commit-message: Update latest dependencies
25+
title: Automated Latest Dependency Updates
26+
body: "This is an auto-generated PR with **latest** dependency updates."
27+
branch: latest-dependency-update
28+
branch-suffix: short-commit-hash
29+
base: main

.github/workflows/readme.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,5 +22,7 @@ jobs:
2222
run: |
2323
python -m pip install --upgrade pip
2424
python -m pip install invoke rundoc .
25+
python -m pip install tomli
26+
python -m pip install packaging
2527
- name: Run the README.md
2628
run: invoke readme

CONTRIBUTING.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -219,9 +219,9 @@ This will perform the following actions:
219219
2. Bump the current version to the next release candidate, ``X.Y.Z.dev(N+1)``
220220

221221
After this is done, the new pre-release can be installed by including the ``dev`` section in the
222-
dependency specification, either in ``setup.py``::
222+
dependency specification, either in ``pyproject.toml``::
223223

224-
install_requires = [
224+
dependencies = [
225225
...
226226
'ctgan>=X.Y.Z.dev',
227227
...

HISTORY.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,36 @@
11
# History
22

3+
## v0.9.1 - 2024-03-14
4+
5+
This release changes the `loss_values` attribute of a CTGAN model to contain floats instead of `torch.Tensors`.
6+
7+
### New Features
8+
9+
* Return loss values as float values not PyTorch objects - Issue [#332](https://github.com/sdv-dev/CTGAN/issues/332) by @fealho
10+
11+
### Maintenance
12+
13+
* Transition from using setup.py to pyproject.toml to specify project metadata - Issue [#333](https://github.com/sdv-dev/CTGAN/issues/333) by @R-Palazzo
14+
* Remove bumpversion and use bump-my-version - Issue [#334](https://github.com/sdv-dev/CTGAN/issues/334) by @R-Palazzo
15+
* Add dependency checker - Issue [#336](https://github.com/sdv-dev/CTGAN/issues/336) by @amontanez24
16+
17+
## v0.9.0 - 2024-02-13
18+
19+
This release makes CTGAN sampling more efficient by saving the frequency of each categorical value.
20+
21+
### New Features
22+
23+
* Improve DataSampler efficiency - Issue [#327] ((https://github.com/sdv-dev/CTGAN/issue/327)) by @fealho
24+
25+
## v0.8.0 - 2023-11-13
26+
27+
This release adds a progress bar that will show when setting the `verbose` parameter to `True`
28+
when initializing `TVAE`.
29+
30+
### New Features
31+
32+
* Add verbosity TVAE (progress bar + save the loss values) - Issue [#300]((https://github.com/sdv-dev/CTGAN/issues/300) by @frances-h
33+
334
## v0.7.5 - 2023-10-05
435

536
This release adds a progress bar that will show when setting the `verbose` parameter to True when initializing `CTGAN`. It also removes a warning that was showing.

MANIFEST.in

Lines changed: 0 additions & 11 deletions
This file was deleted.

Makefile

Lines changed: 14 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -76,16 +76,9 @@ install-test: clean-build clean-pyc ## install the package and test dependencies
7676
install-develop: clean-build clean-pyc ## install the package in editable mode and dependencies for development
7777
pip install -e .[dev]
7878

79-
MINIMUM := $(shell sed -n '/install_requires = \[/,/]/p' setup.py | head -n-1 | tail -n+2 | sed 's/ *\(.*\),$?$$/\1/g' | tr '>' '=')
80-
81-
.PHONY: install-minimum
82-
install-minimum: ## install the minimum supported versions of the package dependencies
83-
pip install $(MINIMUM)
84-
8579

8680
# LINT TARGETS
8781

88-
8982
.PHONY: lint
9083
lint: ## check style with flake8 and isort
9184
invoke lint
@@ -138,8 +131,7 @@ coverage: ## check code coverage quickly with the default Python
138131

139132
.PHONY: dist
140133
dist: clean ## builds source and wheel package
141-
python setup.py sdist
142-
python setup.py bdist_wheel
134+
python -m build --wheel --sdist
143135
ls -l dist
144136

145137
.PHONY: publish-confirm
@@ -161,34 +153,34 @@ publish: dist publish-confirm ## package and upload a release
161153
bumpversion-release: ## Merge main to stable and bumpversion release
162154
git checkout stable || git checkout -b stable
163155
git merge --no-ff main -m"make release-tag: Merge branch 'main' into stable"
164-
bumpversion release
156+
bump-my-version bump release
165157
git push --tags origin stable
166158

167159
.PHONY: bumpversion-release-test
168160
bumpversion-release-test: ## Merge main to stable and bumpversion release
169161
git checkout stable || git checkout -b stable
170162
git merge --no-ff main -m"make release-tag: Merge branch 'main' into stable"
171-
bumpversion release --no-tag
163+
bump-my-version bump release --no-tag
172164
@echo git push --tags origin stable
173165

174166
.PHONY: bumpversion-patch
175167
bumpversion-patch: ## Merge stable to main and bumpversion patch
176168
git checkout main
177169
git merge stable
178-
bumpversion --no-tag patch
170+
bump-my-version bump --no-tag patch
179171
git push
180172

181173
.PHONY: bumpversion-candidate
182174
bumpversion-candidate: ## Bump the version to the next candidate
183-
bumpversion candidate --no-tag
175+
bump-my-version bump candidate --no-tag
184176

185177
.PHONY: bumpversion-minor
186178
bumpversion-minor: ## Bump the version the next minor skipping the release
187-
bumpversion --no-tag minor
179+
bump-my-version bump --no-tag minor
188180

189181
.PHONY: bumpversion-major
190182
bumpversion-major: ## Bump the version the next major skipping the release
191-
bumpversion --no-tag major
183+
bump-my-version bump --no-tag major
192184

193185
.PHONY: bumpversion-revert
194186
bumpversion-revert: ## Undo a previous bumpversion-release
@@ -238,3 +230,10 @@ release-minor: check-release bumpversion-minor release
238230

239231
.PHONY: release-major
240232
release-major: check-release bumpversion-major release
233+
234+
# Dependency targets
235+
236+
.PHONY: check-deps
237+
check-deps:
238+
$(eval allow_list='numpy=|pandas=|scikit-learn=|tqdm=|torch=|rdt=')
239+
pip freeze | grep -v "CTGAN.git" | grep -E $(allow_list) > $(OUTPUT_FILEPATH)

ctgan/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
__author__ = 'DataCebo, Inc.'
66
__email__ = '[email protected]'
7-
__version__ = '0.7.5'
7+
__version__ = '0.9.1.dev1'
88

99
from ctgan.demo import load_demo
1010
from ctgan.synthesizers.ctgan import CTGAN

ctgan/data_sampler.py

Lines changed: 15 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ class DataSampler(object):
77
"""DataSampler samples the conditional vector and corresponding data for CTGAN."""
88

99
def __init__(self, data, output_info, log_frequency):
10-
self._data = data
10+
self._data_length = len(data)
1111

1212
def is_discrete_column(column_info):
1313
return (len(column_info) == 1
@@ -115,33 +115,34 @@ def sample_original_condvec(self, batch):
115115
if self._n_discrete_columns == 0:
116116
return None
117117

118+
category_freq = self._discrete_column_category_prob.flatten()
119+
category_freq = category_freq[category_freq != 0]
120+
category_freq = category_freq / np.sum(category_freq)
121+
col_idxs = np.random.choice(np.arange(len(category_freq)), batch, p=category_freq)
118122
cond = np.zeros((batch, self._n_categories), dtype='float32')
119-
120-
for i in range(batch):
121-
row_idx = np.random.randint(0, len(self._data))
122-
col_idx = np.random.randint(0, self._n_discrete_columns)
123-
matrix_st = self._discrete_column_matrix_st[col_idx]
124-
matrix_ed = matrix_st + self._discrete_column_n_category[col_idx]
125-
pick = np.argmax(self._data[row_idx, matrix_st:matrix_ed])
126-
cond[i, pick + self._discrete_column_cond_st[col_idx]] = 1
123+
cond[np.arange(batch), col_idxs] = 1
127124

128125
return cond
129126

130-
def sample_data(self, n, col, opt):
127+
def sample_data(self, data, n, col, opt):
131128
"""Sample data from original training data satisfying the sampled conditional vector.
132129
130+
Args:
131+
data:
132+
The training data.
133133
Returns:
134-
n rows of matrix data.
134+
n:
135+
n rows of matrix data.
135136
"""
136137
if col is None:
137-
idx = np.random.randint(len(self._data), size=n)
138-
return self._data[idx]
138+
idx = np.random.randint(len(data), size=n)
139+
return data[idx]
139140

140141
idx = []
141142
for c, o in zip(col, opt):
142143
idx.append(np.random.choice(self._rid_by_cat_cols[c][o]))
143144

144-
return self._data[idx]
145+
return data[idx]
145146

146147
def dim_cond_vec(self):
147148
"""Return the total number of categories."""

ctgan/synthesizers/ctgan.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -175,8 +175,7 @@ def __init__(self, embedding_dim=128, generator_dim=(256, 256), discriminator_di
175175
self._transformer = None
176176
self._data_sampler = None
177177
self._generator = None
178-
179-
self.loss_values = pd.DataFrame(columns=['Epoch', 'Generator Loss', 'Distriminator Loss'])
178+
self.loss_values = None
180179

181180
@staticmethod
182181
def _gumbel_softmax(logits, tau=1, hard=False, eps=1e-10, dim=-1):
@@ -355,7 +354,8 @@ def fit(self, train_data, discrete_columns=(), epochs=None):
355354
condvec = self._data_sampler.sample_condvec(self._batch_size)
356355
if condvec is None:
357356
c1, m1, col, opt = None, None, None, None
358-
real = self._data_sampler.sample_data(self._batch_size, col, opt)
357+
real = self._data_sampler.sample_data(
358+
train_data, self._batch_size, col, opt)
359359
else:
360360
c1, m1, col, opt = condvec
361361
c1 = torch.from_numpy(c1).to(self._device)
@@ -365,7 +365,7 @@ def fit(self, train_data, discrete_columns=(), epochs=None):
365365
perm = np.arange(self._batch_size)
366366
np.random.shuffle(perm)
367367
real = self._data_sampler.sample_data(
368-
self._batch_size, col[perm], opt[perm])
368+
train_data, self._batch_size, col[perm], opt[perm])
369369
c2 = c1[perm]
370370

371371
fake = self._generator(fakez)
@@ -422,8 +422,8 @@ def fit(self, train_data, discrete_columns=(), epochs=None):
422422
loss_g.backward()
423423
optimizerG.step()
424424

425-
generator_loss = loss_g.detach().cpu()
426-
discriminator_loss = loss_d.detach().cpu()
425+
generator_loss = loss_g.detach().cpu().item()
426+
discriminator_loss = loss_d.detach().cpu().item()
427427

428428
epoch_loss_df = pd.DataFrame({
429429
'Epoch': [i],

ctgan/synthesizers/tvae.py

Lines changed: 35 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,13 @@
11
"""TVAE module."""
22

33
import numpy as np
4+
import pandas as pd
45
import torch
56
from torch.nn import Linear, Module, Parameter, ReLU, Sequential
67
from torch.nn.functional import cross_entropy
78
from torch.optim import Adam
89
from torch.utils.data import DataLoader, TensorDataset
10+
from tqdm import tqdm
911

1012
from ctgan.data_transformer import DataTransformer
1113
from ctgan.synthesizers.base import BaseSynthesizer, random_state
@@ -112,7 +114,8 @@ def __init__(
112114
batch_size=500,
113115
epochs=300,
114116
loss_factor=2,
115-
cuda=True
117+
cuda=True,
118+
verbose=False
116119
):
117120

118121
self.embedding_dim = embedding_dim
@@ -123,6 +126,8 @@ def __init__(
123126
self.batch_size = batch_size
124127
self.loss_factor = loss_factor
125128
self.epochs = epochs
129+
self.loss_values = pd.DataFrame(columns=['Epoch', 'Batch', 'Loss'])
130+
self.verbose = verbose
126131

127132
if not cuda or not torch.cuda.is_available():
128133
device = 'cpu'
@@ -159,7 +164,15 @@ def fit(self, train_data, discrete_columns=()):
159164
list(encoder.parameters()) + list(self.decoder.parameters()),
160165
weight_decay=self.l2scale)
161166

162-
for i in range(self.epochs):
167+
self.loss_values = pd.DataFrame(columns=['Epoch', 'Batch', 'Loss'])
168+
iterator = tqdm(range(self.epochs), disable=(not self.verbose))
169+
if self.verbose:
170+
iterator_description = 'Loss: {loss:.3f}'
171+
iterator.set_description(iterator_description.format(loss=0))
172+
173+
for i in iterator:
174+
loss_values = []
175+
batch = []
163176
for id_, data in enumerate(loader):
164177
optimizerAE.zero_grad()
165178
real = data[0].to(self._device)
@@ -176,6 +189,26 @@ def fit(self, train_data, discrete_columns=()):
176189
optimizerAE.step()
177190
self.decoder.sigma.data.clamp_(0.01, 1.0)
178191

192+
batch.append(id_)
193+
loss_values.append(loss.detach().cpu().item())
194+
195+
epoch_loss_df = pd.DataFrame({
196+
'Epoch': [i] * len(batch),
197+
'Batch': batch,
198+
'Loss': loss_values
199+
})
200+
if not self.loss_values.empty:
201+
self.loss_values = pd.concat(
202+
[self.loss_values, epoch_loss_df]
203+
).reset_index(drop=True)
204+
else:
205+
self.loss_values = epoch_loss_df
206+
207+
if self.verbose:
208+
iterator.set_description(
209+
iterator_description.format(
210+
loss=loss.detach().cpu().item()))
211+
179212
@random_state
180213
def sample(self, samples):
181214
"""Sample data similar to the training data.

0 commit comments

Comments
 (0)