Merge pull request #990 from weixuanfu/v0_11_1

weixuanfu · web-flow · commit aea42a59b52f · 2020-01-03T12:56:21.000-05:00
v0.11.1 minor release
diff --git a/.travis.yml b/.travis.yml
@@ -12,13 +12,6 @@ matrix:
     env: PYTHON_VERSION="3.7"  COVERAGE="true"  DASK_ML_VERSION="1.0.0"
     before_install:
       - wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
-  - name: "Python 3.7 on macOS"
-    os: osx
-    osx_image: xcode10.2  # Python 3.7.2 running on macOS 10.14.3
-    language: shell       # 'language: python' is an error on Travis CI macOS
-    env: PYTHON_VERSION="3.7"  DASK_ML_VERSION="1.0.0"
-    before_install:
-      - wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -O miniconda.sh
 install: source ./ci/.travis_install.sh
 script: bash ./ci/.travis_test.sh
 after_success:
diff --git a/docs/examples/index.html b/docs/examples/index.html
@@ -203,7 +203,7 @@ <h2 id="overview">Overview</h2>
 <td>subscription prediction</td>
 <td>classification</td>
 <td align="center"><a href="https://archive.ics.uci.edu/ml/datasets/Bank+Marketing">link</a></td>
-<td align="center"><a href="https://github.com/EpistasisLab/tpot/blob/master/tutorials/Portuguese%20Bank%20Marketing/Portuguese%20Bank%20Marketing%20Stratergy.ipynb">link</a></td>
+<td align="center"><a href="https://github.com/EpistasisLab/tpot/blob/master/tutorials/Portuguese%20Bank%20Marketing/Portuguese%20Bank%20Marketing%20Strategy.ipynb">link</a></td>
 </tr>
 <tr>
 <td>MAGIC Gamma Telescope</td>
diff --git a/docs/index.html b/docs/index.html
@@ -213,5 +213,5 @@
 
 <!--
 MkDocs version : 0.17.2
-Build Date UTC : 2019-11-05 20:44:02
+Build Date UTC : 2020-01-03 17:34:52
 -->
diff --git a/docs/search/search_index.json b/docs/search/search_index.json
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
@@ -4,79 +4,79 @@
     
     <url>
      <loc>http://epistasislab.github.io/tpot/</loc>
-     <lastmod>2019-11-05</lastmod>
+     <lastmod>2020-01-03</lastmod>
      <changefreq>daily</changefreq>
     </url>
     
 
     
     <url>
      <loc>http://epistasislab.github.io/tpot/installing/</loc>
-     <lastmod>2019-11-05</lastmod>
+     <lastmod>2020-01-03</lastmod>
      <changefreq>daily</changefreq>
     </url>
     
 
     
     <url>
      <loc>http://epistasislab.github.io/tpot/using/</loc>
-     <lastmod>2019-11-05</lastmod>
+     <lastmod>2020-01-03</lastmod>
      <changefreq>daily</changefreq>
     </url>
     
 
     
     <url>
      <loc>http://epistasislab.github.io/tpot/api/</loc>
-     <lastmod>2019-11-05</lastmod>
+     <lastmod>2020-01-03</lastmod>
      <changefreq>daily</changefreq>
     </url>
     
 
     
     <url>
      <loc>http://epistasislab.github.io/tpot/examples/</loc>
-     <lastmod>2019-11-05</lastmod>
+     <lastmod>2020-01-03</lastmod>
      <changefreq>daily</changefreq>
     </url>
     
 
     
     <url>
      <loc>http://epistasislab.github.io/tpot/contributing/</loc>
-     <lastmod>2019-11-05</lastmod>
+     <lastmod>2020-01-03</lastmod>
      <changefreq>daily</changefreq>
     </url>
     
 
     
     <url>
      <loc>http://epistasislab.github.io/tpot/releases/</loc>
-     <lastmod>2019-11-05</lastmod>
+     <lastmod>2020-01-03</lastmod>
      <changefreq>daily</changefreq>
     </url>
     
 
     
     <url>
      <loc>http://epistasislab.github.io/tpot/citing/</loc>
-     <lastmod>2019-11-05</lastmod>
+     <lastmod>2020-01-03</lastmod>
      <changefreq>daily</changefreq>
     </url>
     
 
     
     <url>
      <loc>http://epistasislab.github.io/tpot/support/</loc>
-     <lastmod>2019-11-05</lastmod>
+     <lastmod>2020-01-03</lastmod>
      <changefreq>daily</changefreq>
     </url>
     
 
     
     <url>
      <loc>http://epistasislab.github.io/tpot/related/</loc>
-     <lastmod>2019-11-05</lastmod>
+     <lastmod>2020-01-03</lastmod>
      <changefreq>daily</changefreq>
     </url>
     
diff --git a/docs/using/index.html b/docs/using/index.html
@@ -661,7 +661,7 @@ <h1 id="template-option-in-tpot">Template option in TPOT</h1>
 
 <p>If a specific operator, e.g. <code>SelectPercentile</code>, is preferred for usage in the 1st step of the pipeline, the template can be defined like 'SelectPercentile-Transformer-Classifier'.</p>
 <h1 id="featuresetselector-in-tpot">FeatureSetSelector in TPOT</h1>
-<p><code>FeatureSetSelector</code> is a special new operator in TPOT. This operator enables feature selection based on <em>priori</em> export knowledge. For example, in RNA-seq gene expression analysis, this operator can be used to select one or more gene (feature) set(s) based on GO (Gene Ontology) terms or annotated gene sets Molecular Signatures Database (<a href="http://software.broadinstitute.org/gsea/msigdb/index.jsp">MSigDB</a>) in the 1st step of pipeline via <code>template</code> option above, in order to reduce dimensions and TPOT computation time. This operator requires a dataset list in csv format. In this csv file, there are only three columns: 1st column is feature set names, 2nd column is the total number of features in one set and 3rd column is a list of feature names (if input X is pandas.DataFrame) or indexes (if input X is numpy.ndarray) delimited by ";". Below is a example how to use this operator in TPOT.</p>
+<p><code>FeatureSetSelector</code> is a special new operator in TPOT. This operator enables feature selection based on <em>priori</em> expert knowledge. For example, in RNA-seq gene expression analysis, this operator can be used to select one or more gene (feature) set(s) based on GO (Gene Ontology) terms or annotated gene sets Molecular Signatures Database (<a href="http://software.broadinstitute.org/gsea/msigdb/index.jsp">MSigDB</a>) in the 1st step of pipeline via <code>template</code> option above, in order to reduce dimensions and TPOT computation time. This operator requires a dataset list in csv format. In this csv file, there are only three columns: 1st column is feature set names, 2nd column is the total number of features in one set and 3rd column is a list of feature names (if input X is pandas.DataFrame) or indexes (if input X is numpy.ndarray) delimited by ";". Below is a example how to use this operator in TPOT.</p>
 <p>Please check our <a href="https://www.biorxiv.org/content/10.1101/502484v1.article-info">preprint paper</a> for more details.</p>
 <pre><code class="Python">from tpot import TPOTClassifier
 import numpy as np
diff --git a/docs_sources/using.md b/docs_sources/using.md
@@ -550,7 +550,7 @@ If a specific operator, e.g. `SelectPercentile`, is preferred for usage in the 1
 
 # FeatureSetSelector in TPOT
 
-`FeatureSetSelector` is a special new operator in TPOT. This operator enables feature selection based on *priori* export knowledge. For example, in RNA-seq gene expression analysis, this operator can be used to select one or more gene (feature) set(s) based on GO (Gene Ontology) terms or annotated gene sets Molecular Signatures Database ([MSigDB](http://software.broadinstitute.org/gsea/msigdb/index.jsp)) in the 1st step of pipeline via `template` option above, in order to reduce dimensions and TPOT computation time. This operator requires a dataset list in csv format. In this csv file, there are only three columns: 1st column is feature set names, 2nd column is the total number of features in one set and 3rd column is a list of feature names (if input X is pandas.DataFrame) or indexes (if input X is numpy.ndarray) delimited by ";". Below is a example how to use this operator in TPOT.
+`FeatureSetSelector` is a special new operator in TPOT. This operator enables feature selection based on *priori* expert knowledge. For example, in RNA-seq gene expression analysis, this operator can be used to select one or more gene (feature) set(s) based on GO (Gene Ontology) terms or annotated gene sets Molecular Signatures Database ([MSigDB](http://software.broadinstitute.org/gsea/msigdb/index.jsp)) in the 1st step of pipeline via `template` option above, in order to reduce dimensions and TPOT computation time. This operator requires a dataset list in csv format. In this csv file, there are only three columns: 1st column is feature set names, 2nd column is the total number of features in one set and 3rd column is a list of feature names (if input X is pandas.DataFrame) or indexes (if input X is numpy.ndarray) delimited by ";". Below is a example how to use this operator in TPOT.
 
 Please check our [preprint paper](https://www.biorxiv.org/content/10.1101/502484v1.article-info) for more details.
 
diff --git a/optional-requirements.txt b/optional-requirements.txt
@@ -1,3 +1,3 @@
-xgboost==0.6a2
+xgboost==0.90
 scikit-mdr==0.4.4
 skrebate==0.3.4
diff --git a/requirements.txt b/requirements.txt
@@ -1,10 +1,10 @@
 deap>=1.2
 nose==1.3.7
 numpy>=1.16.3
-scikit-learn>=0.21.0
+scikit-learn>=0.22.0
 scipy>=1.3.1
 tqdm>=4.36.1
 update-checker>=0.16
-stopit>=1.1.1
+stopit>=1.1.2
 pandas>=0.24.2
 joblib>=0.13.2
diff --git a/setup.py b/setup.py
@@ -37,7 +37,7 @@ def calculate_version():
     zip_safe=True,
     install_requires=['numpy>=1.16.3',
                     'scipy>=1.3.1',
-                    'scikit-learn>=0.21.0',
+                    'scikit-learn>=0.22.0',
                     'deap>=1.2',
                     'update_checker>=0.16',
                     'tqdm>=4.36.1',
diff --git a/tests/driver_tests.py b/tests/driver_tests.py
@@ -296,8 +296,6 @@ def test_print_args(self):
 VERBOSITY           =     1
 
 """
-        print
-
         self.assertEqual(_sort_lines(expected_output), _sort_lines(output))
 
 
diff --git a/tests/export_tests.py b/tests/export_tests.py
@@ -71,7 +71,6 @@ def test_export_random_ind():
 import pandas as pd
 from sklearn.model_selection import train_test_split
 from sklearn.naive_bayes import BernoulliNB
-from tpot.export_utils import set_param_recursive
 
 # NOTE: Make sure that the outcome column is labeled 'target' in the data file
 tpot_data = pd.read_csv('PATH/TO/DATA/FILE', sep='COLUMN_SEPARATOR', dtype=np.float64)
@@ -80,14 +79,14 @@ def test_export_random_ind():
             train_test_split(features, tpot_data['target'], random_state=39)
 
 exported_pipeline = BernoulliNB(alpha=1.0, fit_prior=False)
-# Fix random state for all the steps in exported pipeline
-set_param_recursive(exported_pipeline.steps, 'random_state', 39)
+# Fix random state in exported estimator
+if hasattr(exported_pipeline, 'random_state'):
+    setattr(exported_pipeline, 'random_state', 39)
 
 exported_pipeline.fit(training_features, training_target)
 results = exported_pipeline.predict(testing_features)
 """
     exported_code = export_pipeline(pipeline, tpot_obj.operators, tpot_obj._pset, random_state=tpot_obj.random_state)
-
     assert expected_code == exported_code
 
 
@@ -487,18 +486,17 @@ def test_export_pipeline_6():
     """Assert that exported_pipeline() generated a compile source file with random_state and data_file_path."""
 
     pipeline_string = (
-        'KNeighborsClassifier('
-        'input_matrix, '
-        'KNeighborsClassifier__n_neighbors=10, '
-        'KNeighborsClassifier__p=1, '
-        'KNeighborsClassifier__weights=uniform'
-        ')'
+        'DecisionTreeClassifier(SelectPercentile(input_matrix, SelectPercentile__percentile=20),'
+        'DecisionTreeClassifier__criterion=gini, DecisionTreeClassifier__max_depth=8,'
+        'DecisionTreeClassifier__min_samples_leaf=5, DecisionTreeClassifier__min_samples_split=5)'
     )
     pipeline = creator.Individual.from_string(pipeline_string, tpot_obj._pset)
     expected_code = """import numpy as np
 import pandas as pd
+from sklearn.feature_selection import SelectPercentile, f_classif
 from sklearn.model_selection import train_test_split
-from sklearn.neighbors import KNeighborsClassifier
+from sklearn.pipeline import make_pipeline
+from sklearn.tree import DecisionTreeClassifier
 from tpot.export_utils import set_param_recursive
 
 # NOTE: Make sure that the outcome column is labeled 'target' in the data file
@@ -507,7 +505,10 @@ def test_export_pipeline_6():
 training_features, testing_features, training_target, testing_target = \\
             train_test_split(features, tpot_data['target'], random_state=42)
 
-exported_pipeline = KNeighborsClassifier(n_neighbors=10, p=1, weights="uniform")
+exported_pipeline = make_pipeline(
+    SelectPercentile(score_func=f_classif, percentile=20),
+    DecisionTreeClassifier(criterion="gini", max_depth=8, min_samples_leaf=5, min_samples_split=5)
+)
 # Fix random state for all the steps in exported pipeline
 set_param_recursive(exported_pipeline.steps, 'random_state', 42)
 
diff --git a/tests/stacking_estimator_tests.py b/tests/stacking_estimator_tests.py
@@ -78,7 +78,7 @@ def test_StackingEstimator_3():
 
     # test cv score
     cv_score = np.mean(cross_val_score(sklearn_pipeline, training_features, training_target, cv=3, scoring='accuracy'))
-    known_cv_score = 0.9472823753147593
+    known_cv_score = 0.9643652561247217
 
     assert np.allclose(known_cv_score, cv_score)
 
@@ -101,6 +101,6 @@ def test_StackingEstimator_4():
 
     # test cv score
     cv_score = np.mean(cross_val_score(sklearn_pipeline, training_features_r, training_target_r, cv=3, scoring='r2'))
-    known_cv_score = 0.7989564328211737
+    known_cv_score = 0.8216045257587923
 
     assert np.allclose(known_cv_score, cv_score)
diff --git a/tests/tpot_tests.py b/tests/tpot_tests.py
@@ -58,7 +58,10 @@
 from joblib import Memory
 from sklearn.metrics import make_scorer, roc_auc_score
 from sklearn.base import BaseEstimator, ClassifierMixin, RegressorMixin, TransformerMixin
-from sklearn.feature_selection.base import SelectorMixin
+try:
+    from sklearn.feature_selection._base import SelectorMixin
+except ImportError:
+    from sklearn.feature_selection.base import SelectorMixin
 from deap import creator, gp
 from deap.tools import ParetoFront
 from nose.tools import nottest, assert_raises, assert_not_equal, assert_greater_equal, assert_equal, assert_in
@@ -965,7 +968,7 @@ def test_fit_4():
     assert tpot_obj.generations == 1000000
 
     # reset generations to 20 just in case that the failed test may take too much time
-    tpot_obj.generations == 20
+    tpot_obj.generations = 20
 
     tpot_obj.fit(training_features, training_target)
     assert tpot_obj._pop == []
@@ -988,7 +991,7 @@ def test_fit_5():
     assert tpot_obj.generations == 1000000
 
     # reset generations to 20 just in case that the failed test may take too much time
-    tpot_obj.generations == 20
+    tpot_obj.generations = 20
 
     tpot_obj.fit(training_features, training_target)
     assert tpot_obj._pop != []
@@ -1426,7 +1429,15 @@ def pareto_eq(ind1, ind2):
         sklearn_pipeline = tpot_obj._toolbox.compile(expr=deap_pipeline)
 
         try:
-            cv_scores = cross_val_score(sklearn_pipeline, training_features, training_target, cv=5, scoring='accuracy', verbose=0)
+            with warnings.catch_warnings():
+                warnings.simplefilter('ignore')
+                cv_scores = cross_val_score(sklearn_pipeline,
+                                            training_features,
+                                            training_target,
+                                            cv=5,
+                                            scoring='accuracy',
+                                            verbose=0,
+                                            error_score='raise')
             mean_cv_scores = np.mean(cv_scores)
         except Exception as e:
             mean_cv_scores = -float('inf')
@@ -1460,7 +1471,15 @@ def pareto_eq(ind1, ind2):
         sklearn_pipeline = tpot_obj._toolbox.compile(expr=deap_pipeline)
 
         try:
-            cv_scores = cross_val_score(sklearn_pipeline, training_features, training_target, cv=5, scoring='accuracy', verbose=0)
+            with warnings.catch_warnings():
+                warnings.simplefilter('ignore')
+                cv_scores = cross_val_score(sklearn_pipeline,
+                                            training_features,
+                                            training_target,
+                                            cv=5,
+                                            scoring='accuracy',
+                                            verbose=0,
+                                            error_score='raise')
             mean_cv_scores = np.mean(cv_scores)
         except Exception as e:
             mean_cv_scores = -float('inf')
diff --git a/tpot/_version.py b/tpot/_version.py
@@ -23,4 +23,4 @@
 
 """
 
-__version__ = '0.11.0'
+__version__ = '0.11.1'
diff --git a/tpot/base.py b/tpot/base.py
diff --git a/tpot/builtins/feature_set_selector.py b/tpot/builtins/feature_set_selector.py
diff --git a/tpot/builtins/stacking_estimator.py b/tpot/builtins/stacking_estimator.py
diff --git a/tpot/export_utils.py b/tpot/export_utils.py
diff --git a/tpot/gp_deap.py b/tpot/gp_deap.py
diff --git a/tpot/operator_utils.py b/tpot/operator_utils.py

Original file line number	Diff line number	Diff line change
`@@ -23,4 +23,4 @@`
`23`	`23`
`24`	`24`	`"""`
`25`	`25`
`26`		`-__version__ = '0.11.0'`
	`26`	`+__version__ = '0.11.1'`