Skip to content

Commit aea42a5

Browse files
authored
Merge pull request #990 from weixuanfu/v0_11_1
v0.11.1 minor release
2 parents 3d31727 + e6e7ce6 commit aea42a5

21 files changed

+124
-103
lines changed

.travis.yml

-7
Original file line numberDiff line numberDiff line change
@@ -12,13 +12,6 @@ matrix:
1212
env: PYTHON_VERSION="3.7" COVERAGE="true" DASK_ML_VERSION="1.0.0"
1313
before_install:
1414
- wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
15-
- name: "Python 3.7 on macOS"
16-
os: osx
17-
osx_image: xcode10.2 # Python 3.7.2 running on macOS 10.14.3
18-
language: shell # 'language: python' is an error on Travis CI macOS
19-
env: PYTHON_VERSION="3.7" DASK_ML_VERSION="1.0.0"
20-
before_install:
21-
- wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -O miniconda.sh
2215
install: source ./ci/.travis_install.sh
2316
script: bash ./ci/.travis_test.sh
2417
after_success:

docs/examples/index.html

+1-1
Original file line numberDiff line numberDiff line change
@@ -203,7 +203,7 @@ <h2 id="overview">Overview</h2>
203203
<td>subscription prediction</td>
204204
<td>classification</td>
205205
<td align="center"><a href="https://archive.ics.uci.edu/ml/datasets/Bank+Marketing">link</a></td>
206-
<td align="center"><a href="https://github.com/EpistasisLab/tpot/blob/master/tutorials/Portuguese%20Bank%20Marketing/Portuguese%20Bank%20Marketing%20Stratergy.ipynb">link</a></td>
206+
<td align="center"><a href="https://github.com/EpistasisLab/tpot/blob/master/tutorials/Portuguese%20Bank%20Marketing/Portuguese%20Bank%20Marketing%20Strategy.ipynb">link</a></td>
207207
</tr>
208208
<tr>
209209
<td>MAGIC Gamma Telescope</td>

docs/index.html

+1-1
Original file line numberDiff line numberDiff line change
@@ -213,5 +213,5 @@
213213

214214
<!--
215215
MkDocs version : 0.17.2
216-
Build Date UTC : 2019-11-05 20:44:02
216+
Build Date UTC : 2020-01-03 17:34:52
217217
-->

docs/search/search_index.json

+2-2
Large diffs are not rendered by default.

docs/sitemap.xml

+10-10
Original file line numberDiff line numberDiff line change
@@ -4,79 +4,79 @@
44

55
<url>
66
<loc>http://epistasislab.github.io/tpot/</loc>
7-
<lastmod>2019-11-05</lastmod>
7+
<lastmod>2020-01-03</lastmod>
88
<changefreq>daily</changefreq>
99
</url>
1010

1111

1212

1313
<url>
1414
<loc>http://epistasislab.github.io/tpot/installing/</loc>
15-
<lastmod>2019-11-05</lastmod>
15+
<lastmod>2020-01-03</lastmod>
1616
<changefreq>daily</changefreq>
1717
</url>
1818

1919

2020

2121
<url>
2222
<loc>http://epistasislab.github.io/tpot/using/</loc>
23-
<lastmod>2019-11-05</lastmod>
23+
<lastmod>2020-01-03</lastmod>
2424
<changefreq>daily</changefreq>
2525
</url>
2626

2727

2828

2929
<url>
3030
<loc>http://epistasislab.github.io/tpot/api/</loc>
31-
<lastmod>2019-11-05</lastmod>
31+
<lastmod>2020-01-03</lastmod>
3232
<changefreq>daily</changefreq>
3333
</url>
3434

3535

3636

3737
<url>
3838
<loc>http://epistasislab.github.io/tpot/examples/</loc>
39-
<lastmod>2019-11-05</lastmod>
39+
<lastmod>2020-01-03</lastmod>
4040
<changefreq>daily</changefreq>
4141
</url>
4242

4343

4444

4545
<url>
4646
<loc>http://epistasislab.github.io/tpot/contributing/</loc>
47-
<lastmod>2019-11-05</lastmod>
47+
<lastmod>2020-01-03</lastmod>
4848
<changefreq>daily</changefreq>
4949
</url>
5050

5151

5252

5353
<url>
5454
<loc>http://epistasislab.github.io/tpot/releases/</loc>
55-
<lastmod>2019-11-05</lastmod>
55+
<lastmod>2020-01-03</lastmod>
5656
<changefreq>daily</changefreq>
5757
</url>
5858

5959

6060

6161
<url>
6262
<loc>http://epistasislab.github.io/tpot/citing/</loc>
63-
<lastmod>2019-11-05</lastmod>
63+
<lastmod>2020-01-03</lastmod>
6464
<changefreq>daily</changefreq>
6565
</url>
6666

6767

6868

6969
<url>
7070
<loc>http://epistasislab.github.io/tpot/support/</loc>
71-
<lastmod>2019-11-05</lastmod>
71+
<lastmod>2020-01-03</lastmod>
7272
<changefreq>daily</changefreq>
7373
</url>
7474

7575

7676

7777
<url>
7878
<loc>http://epistasislab.github.io/tpot/related/</loc>
79-
<lastmod>2019-11-05</lastmod>
79+
<lastmod>2020-01-03</lastmod>
8080
<changefreq>daily</changefreq>
8181
</url>
8282

docs/using/index.html

+1-1
Original file line numberDiff line numberDiff line change
@@ -661,7 +661,7 @@ <h1 id="template-option-in-tpot">Template option in TPOT</h1>
661661

662662
<p>If a specific operator, e.g. <code>SelectPercentile</code>, is preferred for usage in the 1st step of the pipeline, the template can be defined like 'SelectPercentile-Transformer-Classifier'.</p>
663663
<h1 id="featuresetselector-in-tpot">FeatureSetSelector in TPOT</h1>
664-
<p><code>FeatureSetSelector</code> is a special new operator in TPOT. This operator enables feature selection based on <em>priori</em> export knowledge. For example, in RNA-seq gene expression analysis, this operator can be used to select one or more gene (feature) set(s) based on GO (Gene Ontology) terms or annotated gene sets Molecular Signatures Database (<a href="http://software.broadinstitute.org/gsea/msigdb/index.jsp">MSigDB</a>) in the 1st step of pipeline via <code>template</code> option above, in order to reduce dimensions and TPOT computation time. This operator requires a dataset list in csv format. In this csv file, there are only three columns: 1st column is feature set names, 2nd column is the total number of features in one set and 3rd column is a list of feature names (if input X is pandas.DataFrame) or indexes (if input X is numpy.ndarray) delimited by ";". Below is a example how to use this operator in TPOT.</p>
664+
<p><code>FeatureSetSelector</code> is a special new operator in TPOT. This operator enables feature selection based on <em>priori</em> expert knowledge. For example, in RNA-seq gene expression analysis, this operator can be used to select one or more gene (feature) set(s) based on GO (Gene Ontology) terms or annotated gene sets Molecular Signatures Database (<a href="http://software.broadinstitute.org/gsea/msigdb/index.jsp">MSigDB</a>) in the 1st step of pipeline via <code>template</code> option above, in order to reduce dimensions and TPOT computation time. This operator requires a dataset list in csv format. In this csv file, there are only three columns: 1st column is feature set names, 2nd column is the total number of features in one set and 3rd column is a list of feature names (if input X is pandas.DataFrame) or indexes (if input X is numpy.ndarray) delimited by ";". Below is a example how to use this operator in TPOT.</p>
665665
<p>Please check our <a href="https://www.biorxiv.org/content/10.1101/502484v1.article-info">preprint paper</a> for more details.</p>
666666
<pre><code class="Python">from tpot import TPOTClassifier
667667
import numpy as np

docs_sources/using.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -550,7 +550,7 @@ If a specific operator, e.g. `SelectPercentile`, is preferred for usage in the 1
550550

551551
# FeatureSetSelector in TPOT
552552

553-
`FeatureSetSelector` is a special new operator in TPOT. This operator enables feature selection based on *priori* export knowledge. For example, in RNA-seq gene expression analysis, this operator can be used to select one or more gene (feature) set(s) based on GO (Gene Ontology) terms or annotated gene sets Molecular Signatures Database ([MSigDB](http://software.broadinstitute.org/gsea/msigdb/index.jsp)) in the 1st step of pipeline via `template` option above, in order to reduce dimensions and TPOT computation time. This operator requires a dataset list in csv format. In this csv file, there are only three columns: 1st column is feature set names, 2nd column is the total number of features in one set and 3rd column is a list of feature names (if input X is pandas.DataFrame) or indexes (if input X is numpy.ndarray) delimited by ";". Below is a example how to use this operator in TPOT.
553+
`FeatureSetSelector` is a special new operator in TPOT. This operator enables feature selection based on *priori* expert knowledge. For example, in RNA-seq gene expression analysis, this operator can be used to select one or more gene (feature) set(s) based on GO (Gene Ontology) terms or annotated gene sets Molecular Signatures Database ([MSigDB](http://software.broadinstitute.org/gsea/msigdb/index.jsp)) in the 1st step of pipeline via `template` option above, in order to reduce dimensions and TPOT computation time. This operator requires a dataset list in csv format. In this csv file, there are only three columns: 1st column is feature set names, 2nd column is the total number of features in one set and 3rd column is a list of feature names (if input X is pandas.DataFrame) or indexes (if input X is numpy.ndarray) delimited by ";". Below is a example how to use this operator in TPOT.
554554

555555
Please check our [preprint paper](https://www.biorxiv.org/content/10.1101/502484v1.article-info) for more details.
556556

optional-requirements.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
xgboost==0.6a2
1+
xgboost==0.90
22
scikit-mdr==0.4.4
33
skrebate==0.3.4

requirements.txt

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
deap>=1.2
22
nose==1.3.7
33
numpy>=1.16.3
4-
scikit-learn>=0.21.0
4+
scikit-learn>=0.22.0
55
scipy>=1.3.1
66
tqdm>=4.36.1
77
update-checker>=0.16
8-
stopit>=1.1.1
8+
stopit>=1.1.2
99
pandas>=0.24.2
1010
joblib>=0.13.2

setup.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ def calculate_version():
3737
zip_safe=True,
3838
install_requires=['numpy>=1.16.3',
3939
'scipy>=1.3.1',
40-
'scikit-learn>=0.21.0',
40+
'scikit-learn>=0.22.0',
4141
'deap>=1.2',
4242
'update_checker>=0.16',
4343
'tqdm>=4.36.1',

tests/driver_tests.py

-2
Original file line numberDiff line numberDiff line change
@@ -296,8 +296,6 @@ def test_print_args(self):
296296
VERBOSITY = 1
297297
298298
"""
299-
print
300-
301299
self.assertEqual(_sort_lines(expected_output), _sort_lines(output))
302300

303301

tests/export_tests.py

+13-12
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,6 @@ def test_export_random_ind():
7171
import pandas as pd
7272
from sklearn.model_selection import train_test_split
7373
from sklearn.naive_bayes import BernoulliNB
74-
from tpot.export_utils import set_param_recursive
7574
7675
# NOTE: Make sure that the outcome column is labeled 'target' in the data file
7776
tpot_data = pd.read_csv('PATH/TO/DATA/FILE', sep='COLUMN_SEPARATOR', dtype=np.float64)
@@ -80,14 +79,14 @@ def test_export_random_ind():
8079
train_test_split(features, tpot_data['target'], random_state=39)
8180
8281
exported_pipeline = BernoulliNB(alpha=1.0, fit_prior=False)
83-
# Fix random state for all the steps in exported pipeline
84-
set_param_recursive(exported_pipeline.steps, 'random_state', 39)
82+
# Fix random state in exported estimator
83+
if hasattr(exported_pipeline, 'random_state'):
84+
setattr(exported_pipeline, 'random_state', 39)
8585
8686
exported_pipeline.fit(training_features, training_target)
8787
results = exported_pipeline.predict(testing_features)
8888
"""
8989
exported_code = export_pipeline(pipeline, tpot_obj.operators, tpot_obj._pset, random_state=tpot_obj.random_state)
90-
9190
assert expected_code == exported_code
9291

9392

@@ -487,18 +486,17 @@ def test_export_pipeline_6():
487486
"""Assert that exported_pipeline() generated a compile source file with random_state and data_file_path."""
488487

489488
pipeline_string = (
490-
'KNeighborsClassifier('
491-
'input_matrix, '
492-
'KNeighborsClassifier__n_neighbors=10, '
493-
'KNeighborsClassifier__p=1, '
494-
'KNeighborsClassifier__weights=uniform'
495-
')'
489+
'DecisionTreeClassifier(SelectPercentile(input_matrix, SelectPercentile__percentile=20),'
490+
'DecisionTreeClassifier__criterion=gini, DecisionTreeClassifier__max_depth=8,'
491+
'DecisionTreeClassifier__min_samples_leaf=5, DecisionTreeClassifier__min_samples_split=5)'
496492
)
497493
pipeline = creator.Individual.from_string(pipeline_string, tpot_obj._pset)
498494
expected_code = """import numpy as np
499495
import pandas as pd
496+
from sklearn.feature_selection import SelectPercentile, f_classif
500497
from sklearn.model_selection import train_test_split
501-
from sklearn.neighbors import KNeighborsClassifier
498+
from sklearn.pipeline import make_pipeline
499+
from sklearn.tree import DecisionTreeClassifier
502500
from tpot.export_utils import set_param_recursive
503501
504502
# NOTE: Make sure that the outcome column is labeled 'target' in the data file
@@ -507,7 +505,10 @@ def test_export_pipeline_6():
507505
training_features, testing_features, training_target, testing_target = \\
508506
train_test_split(features, tpot_data['target'], random_state=42)
509507
510-
exported_pipeline = KNeighborsClassifier(n_neighbors=10, p=1, weights="uniform")
508+
exported_pipeline = make_pipeline(
509+
SelectPercentile(score_func=f_classif, percentile=20),
510+
DecisionTreeClassifier(criterion="gini", max_depth=8, min_samples_leaf=5, min_samples_split=5)
511+
)
511512
# Fix random state for all the steps in exported pipeline
512513
set_param_recursive(exported_pipeline.steps, 'random_state', 42)
513514

tests/stacking_estimator_tests.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ def test_StackingEstimator_3():
7878

7979
# test cv score
8080
cv_score = np.mean(cross_val_score(sklearn_pipeline, training_features, training_target, cv=3, scoring='accuracy'))
81-
known_cv_score = 0.9472823753147593
81+
known_cv_score = 0.9643652561247217
8282

8383
assert np.allclose(known_cv_score, cv_score)
8484

@@ -101,6 +101,6 @@ def test_StackingEstimator_4():
101101

102102
# test cv score
103103
cv_score = np.mean(cross_val_score(sklearn_pipeline, training_features_r, training_target_r, cv=3, scoring='r2'))
104-
known_cv_score = 0.7989564328211737
104+
known_cv_score = 0.8216045257587923
105105

106106
assert np.allclose(known_cv_score, cv_score)

tests/tpot_tests.py

+24-5
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,10 @@
5858
from joblib import Memory
5959
from sklearn.metrics import make_scorer, roc_auc_score
6060
from sklearn.base import BaseEstimator, ClassifierMixin, RegressorMixin, TransformerMixin
61-
from sklearn.feature_selection.base import SelectorMixin
61+
try:
62+
from sklearn.feature_selection._base import SelectorMixin
63+
except ImportError:
64+
from sklearn.feature_selection.base import SelectorMixin
6265
from deap import creator, gp
6366
from deap.tools import ParetoFront
6467
from nose.tools import nottest, assert_raises, assert_not_equal, assert_greater_equal, assert_equal, assert_in
@@ -965,7 +968,7 @@ def test_fit_4():
965968
assert tpot_obj.generations == 1000000
966969

967970
# reset generations to 20 just in case that the failed test may take too much time
968-
tpot_obj.generations == 20
971+
tpot_obj.generations = 20
969972

970973
tpot_obj.fit(training_features, training_target)
971974
assert tpot_obj._pop == []
@@ -988,7 +991,7 @@ def test_fit_5():
988991
assert tpot_obj.generations == 1000000
989992

990993
# reset generations to 20 just in case that the failed test may take too much time
991-
tpot_obj.generations == 20
994+
tpot_obj.generations = 20
992995

993996
tpot_obj.fit(training_features, training_target)
994997
assert tpot_obj._pop != []
@@ -1426,7 +1429,15 @@ def pareto_eq(ind1, ind2):
14261429
sklearn_pipeline = tpot_obj._toolbox.compile(expr=deap_pipeline)
14271430

14281431
try:
1429-
cv_scores = cross_val_score(sklearn_pipeline, training_features, training_target, cv=5, scoring='accuracy', verbose=0)
1432+
with warnings.catch_warnings():
1433+
warnings.simplefilter('ignore')
1434+
cv_scores = cross_val_score(sklearn_pipeline,
1435+
training_features,
1436+
training_target,
1437+
cv=5,
1438+
scoring='accuracy',
1439+
verbose=0,
1440+
error_score='raise')
14301441
mean_cv_scores = np.mean(cv_scores)
14311442
except Exception as e:
14321443
mean_cv_scores = -float('inf')
@@ -1460,7 +1471,15 @@ def pareto_eq(ind1, ind2):
14601471
sklearn_pipeline = tpot_obj._toolbox.compile(expr=deap_pipeline)
14611472

14621473
try:
1463-
cv_scores = cross_val_score(sklearn_pipeline, training_features, training_target, cv=5, scoring='accuracy', verbose=0)
1474+
with warnings.catch_warnings():
1475+
warnings.simplefilter('ignore')
1476+
cv_scores = cross_val_score(sklearn_pipeline,
1477+
training_features,
1478+
training_target,
1479+
cv=5,
1480+
scoring='accuracy',
1481+
verbose=0,
1482+
error_score='raise')
14641483
mean_cv_scores = np.mean(cv_scores)
14651484
except Exception as e:
14661485
mean_cv_scores = -float('inf')

tpot/_version.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -23,4 +23,4 @@
2323
2424
"""
2525

26-
__version__ = '0.11.0'
26+
__version__ = '0.11.1'

0 commit comments

Comments
 (0)