Skip to content

Commit ecf0697

Browse files
committed
update changelog + docs
1 parent 42918d2 commit ecf0697

File tree

7 files changed

+131
-138
lines changed

7 files changed

+131
-138
lines changed

CHANGELOG.md

+18-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,24 @@
22

33
## [Unreleased](https://github.com/hackingmaterials/automatminer/tree/HEAD)
44

5-
[Full Changelog](https://github.com/hackingmaterials/automatminer/compare/v2019.01.25_beta...HEAD)
5+
[Full Changelog](https://github.com/hackingmaterials/automatminer/compare/v2019.01.26_beta...HEAD)
6+
7+
**Closed issues:**
8+
9+
- Nose ---\> unittest [\#171](https://github.com/hackingmaterials/automatminer/issues/171)
10+
- Fix benchmarking [\#170](https://github.com/hackingmaterials/automatminer/issues/170)
11+
- Should add to PyPi [\#168](https://github.com/hackingmaterials/automatminer/issues/168)
12+
- An adapter to run a single model [\#165](https://github.com/hackingmaterials/automatminer/issues/165)
13+
- Add option to remove specific features [\#159](https://github.com/hackingmaterials/automatminer/issues/159)
14+
- Analytics module needs tests [\#133](https://github.com/hackingmaterials/automatminer/issues/133)
15+
16+
**Merged pull requests:**
17+
18+
- Update codacy and circleCI configs [\#173](https://github.com/hackingmaterials/automatminer/pull/173) ([utf](https://github.com/utf))
19+
- Add optional to manually keep/remove features [\#172](https://github.com/hackingmaterials/automatminer/pull/172) ([utf](https://github.com/utf))
20+
21+
## [v2019.01.26_beta](https://github.com/hackingmaterials/automatminer/tree/v2019.01.26_beta) (2019-01-26)
22+
[Full Changelog](https://github.com/hackingmaterials/automatminer/compare/v2019.01.25_beta...v2019.01.26_beta)
623

724
**Closed issues:**
825

automatminer/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,4 @@
77
__author__ = 'Alex Dunn, Qi Wang, Alex Ganose, Daniel Dopp, Anubhav Jain'
88
__author_email__ = '[email protected]'
99
__license__ = 'Modified BSD'
10-
__version__ = '2019.01.26_beta'
10+
__version__ = '2019.02.02_beta'

docs/_sources/index.rst.txt

+33-46
Original file line numberDiff line numberDiff line change
@@ -9,16 +9,21 @@
99
.. image:: _static/logo_med.png
1010
:alt: server
1111
:align: center
12-
:width: 400px
12+
:width: 600px
1313

1414

15-
Automatminer is a tool for automatically creating complete machine learning pipelines for materials science, which includes automatic featurization with `matminer <https://github.com/hackingmaterials/matminer>`_, feature reduction, and an AutoML backend. Put in a materials dataset, get out a machine that predicts materials properties.
15+
Automatminer is a tool for *automatically* creating **complete** machine learning pipelines for materials science, including automatic featurization with `matminer <https://github.com/hackingmaterials/matminer>`_, feature reduction, and an AutoML backend. Put in a materials dataset, get out a machine that predicts materials properties.
1616

17+
How it works
18+
=============
19+
20+
Automatminer automatically decorates a dataset using hundreds of descriptor techniques from matminer's descriptor library, picks the most useful features for learning, and runs a separate AutoML pipeline using TPOT. Once a pipeline has been fit, it can be examined with skater's interpretability tools, summarized in a text file, saved to disk, or used to make new predictions.
1721

1822
.. image:: _static/automatminer_big.jpg
1923
:alt: server
2024
:align: center
2125

26+
2227
Here's an example of training on known data, and extending the model to out of sample data.
2328

2429
.. code-block:: python
@@ -33,7 +38,7 @@ Here's an example of training on known data, and extending the model to out of s
3338
predicted_df = pipe.predict(unknown_df, "band gap")
3439
3540
36-
Alternatively, run a nested cross validation benchmark on a known dataset, and then compare the results against your own ML models:
41+
Or, run a (relatively) rigorous nested cross validation benchmark on a known dataset, and then compare the results against your own ML models:
3742

3843
.. code-block:: python
3944
@@ -45,58 +50,40 @@ Alternatively, run a nested cross validation benchmark on a known dataset, and t
4550
4651
4752
48-
automatminer is applicable to many problems
49-
-------------------------------------------
53+
automatminer is widely applicable
54+
===========================================
5055

5156
Automatminer can work with many kinds of data:
52-
* both computational and experimental data
53-
* small (~100 samples) to moderate (~100k samples) sized datasets
54-
* crystalline datasets
55-
* composition-only (i.e., unknown phases) datasets
56-
* datasets containing electronic bandstructures or density of states
57-
58-
...Many kinds of target properties
59-
* electronic
60-
mechanical
61-
thermodynamic
62-
any other kind of property
63-
64-
...And many featurization (descriptor) techniques:
65-
*list them*
66-
67-
Automatminer automatically decorates a dataset using hundreds of descriptor techniques from matminer's descriptor library, picks the most useful features for learning, and runs a separate AutoML pipeline using TPOT. Once a pipeline has been fit, it can be examined with skater's interpretability tools, summarized in a text file, saved to disk, or used to make new predictions.
68-
69-
70-
Code Examples
71-
=============
57+
----------------------------------------------
58+
- both computational and experimental data
59+
- small (~100 samples) to moderate (~100k samples) sized datasets
60+
- crystalline datasets
61+
- composition-only (i.e., unknown phases) datasets
62+
- datasets containing electronic bandstructures or density of states
7263

73-
The easiest (and most automatic) way to use automatminer is through the MatPipe object. First, fit the MatPipe to a dataframe containing materials objects such as chemical compositions (or pymatgen Structures) and some material target property.
74-
```python
64+
Many kinds of target properties:
65+
--------------------------------
66+
- electronic
67+
- mechanical
68+
- thermodynamic
69+
- any other kind of property
7570

76-
```
71+
And many featurization (descriptor) techniques:
72+
-----------------------------------------------
73+
See `matminer's Table of Featurizers <https://hackingmaterials.github.io/matminer/featurizer_summary.html>`_ for a full (and growing) list.
7774

78-
Now use your pipeline to predict the properties of some other data, such as a new composition or structure.
79-
```python
8075

81-
```
8276

83-
You can also use it to benchmark against other machine learning models with the `benchmark` method of MatPipe, which runs a Nested Cross Validation. The Nested CV scheme
84-
is typically a more robust way of estimating an ML pipeline's generalizaiton error than a simple train/validation/test split.
85-
```python
86-
from automatminer.pipeline import MatPipe
87-
from sklearn.model_selection import KFold
88-
89-
pipe = MatPipe()
90-
predictions_per_fold = pipe.benchmark(df, "bulk modulus", KFold(n_splits=5))
91-
```
92-
93-
Once a MatPipe has been fit, you can examine it internally to see how it works using `pipe.digest()`; or pickle it for later with `pipe.save()`.
77+
Full Code Examples
78+
==================
9479

95-
### Citing automatminer
96-
We are in the process of writing a paper for automatminer. In the meantime, please use the citation given in the matminer repo.
80+
Citing automatminer
81+
===================
82+
We are in the process of writing a paper for automatminer. In the meantime, please use the citation given in the `matminer repo <https://github.com/hackingmaterials/matminer>`_.
9783

98-
## Contributing
99-
Interested in contributing? See our [contribution guidelines](https://github.com/hackingmaterials/automatminer/blob/master/CONTRIBUTING.md) and make a pull request! Please submit questions, issues / bug reports, and all other communication through the [matminer Google Group](https://groups.google.com/forum/#!forum/matminer).
84+
Contributing
85+
============
86+
Interested in contributing? See our `contribution guidelines <https://github.com/hackingmaterials/automatminer/blob/master/CONTRIBUTING.md>`_ and make a pull request! Please submit questions, issues / bug reports, and all other communication through the `matminer Google Group <https://groups.google.com/forum/#!forum/matminer>`_.
10087

10188

10289

docs/index.html

+45-43
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
<head>
77
<meta http-equiv="X-UA-Compatible" content="IE=Edge" />
88
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
9-
<title>automatminer is applicable to many problems &#8212; automatminer 2019.01.26_beta documentation</title>
9+
<title>How it works &#8212; automatminer 2019.01.26_beta documentation</title>
1010
<link rel="stylesheet" href="_static/alabaster.css" type="text/css" />
1111
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
1212
<script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
@@ -32,8 +32,11 @@
3232
<div class="body" role="main">
3333

3434
<style> .red {color:#aa0060; font-weight:bold; font-size:16px} </style><p><span class="red">WARNING! These docs are incomplete. Read and use at your own risk!`</span></p>
35-
<a class="reference internal image-reference" href="_images/logo_med.png"><img alt="server" class="align-center" src="_images/logo_med.png" style="width: 400px;" /></a>
36-
<p>Automatminer is a tool for automatically creating complete machine learning pipelines for materials science, which includes automatic featurization with <a class="reference external" href="https://github.com/hackingmaterials/matminer">matminer</a>, feature reduction, and an AutoML backend. Put in a materials dataset, get out a machine that predicts materials properties.</p>
35+
<a class="reference internal image-reference" href="_images/logo_med.png"><img alt="server" class="align-center" src="_images/logo_med.png" style="width: 600px;" /></a>
36+
<p>Automatminer is a tool for <em>automatically</em> creating <strong>complete</strong> machine learning pipelines for materials science, including automatic featurization with <a class="reference external" href="https://github.com/hackingmaterials/matminer">matminer</a>, feature reduction, and an AutoML backend. Put in a materials dataset, get out a machine that predicts materials properties.</p>
37+
<div class="section" id="how-it-works">
38+
<h1>How it works<a class="headerlink" href="#how-it-works" title="Permalink to this headline"></a></h1>
39+
<p>Automatminer automatically decorates a dataset using hundreds of descriptor techniques from matminer’s descriptor library, picks the most useful features for learning, and runs a separate AutoML pipeline using TPOT. Once a pipeline has been fit, it can be examined with skater’s interpretability tools, summarized in a text file, saved to disk, or used to make new predictions.</p>
3740
<img alt="server" class="align-center" src="_images/automatminer_big.jpg" />
3841
<p>Here’s an example of training on known data, and extending the model to out of sample data.</p>
3942
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">automatminer.pipeline</span> <span class="kn">import</span> <span class="n">MatPipe</span>
@@ -46,60 +49,59 @@
4649
<span class="n">predicted_df</span> <span class="o">=</span> <span class="n">pipe</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">unknown_df</span><span class="p">,</span> <span class="s2">&quot;band gap&quot;</span><span class="p">)</span>
4750
</pre></div>
4851
</div>
49-
<p>Alternatively, run a nested cross validation benchmark on a known dataset, and then compare the results against your own ML models:</p>
52+
<p>Or, run a (relatively) rigorous nested cross validation benchmark on a known dataset, and then compare the results against your own ML models:</p>
5053
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">automatminer.pipeline</span> <span class="kn">import</span> <span class="n">MatPipe</span>
5154
<span class="kn">from</span> <span class="nn">sklearn.model_selection</span> <span class="kn">import</span> <span class="n">KFold</span>
5255

5356
<span class="n">pipe</span> <span class="o">=</span> <span class="n">MatPipe</span><span class="p">()</span>
5457
<span class="n">predictions_per_fold</span> <span class="o">=</span> <span class="n">pipe</span><span class="o">.</span><span class="n">benchmark</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="s2">&quot;bulk modulus&quot;</span><span class="p">,</span> <span class="n">KFold</span><span class="p">(</span><span class="n">n_splits</span><span class="o">=</span><span class="mi">5</span><span class="p">))</span>
5558
</pre></div>
5659
</div>
57-
<div class="section" id="automatminer-is-applicable-to-many-problems">
58-
<h1>automatminer is applicable to many problems<a class="headerlink" href="#automatminer-is-applicable-to-many-problems" title="Permalink to this headline"></a></h1>
59-
<p>Automatminer can work with many kinds of data:
60-
* both computational and experimental data
61-
* small (~100 samples) to moderate (~100k samples) sized datasets
62-
* crystalline datasets
63-
* composition-only (i.e., unknown phases) datasets
64-
* datasets containing electronic bandstructures or density of states</p>
65-
<p>…Many kinds of target properties
66-
* electronic
67-
mechanical
68-
thermodynamic
69-
any other kind of property</p>
70-
<p>…And many featurization (descriptor) techniques:
71-
<em>list them</em></p>
72-
<p>Automatminer automatically decorates a dataset using hundreds of descriptor techniques from matminer’s descriptor library, picks the most useful features for learning, and runs a separate AutoML pipeline using TPOT. Once a pipeline has been fit, it can be examined with skater’s interpretability tools, summarized in a text file, saved to disk, or used to make new predictions.</p>
73-
<div class="section" id="code-examples">
74-
<h2>Code Examples<a class="headerlink" href="#code-examples" title="Permalink to this headline"></a></h2>
75-
<p>The easiest (and most automatic) way to use automatminer is through the MatPipe object. First, fit the MatPipe to a dataframe containing materials objects such as chemical compositions (or pymatgen Structures) and some material target property.
76-
<a href="#id1"><span class="problematic" id="id2">``</span></a><a href="#id3"><span class="problematic" id="id4">`</span></a>python</p>
77-
<p><a href="#id5"><span class="problematic" id="id6">``</span></a><a href="#id7"><span class="problematic" id="id8">`</span></a></p>
78-
<p>Now use your pipeline to predict the properties of some other data, such as a new composition or structure.
79-
<a href="#id9"><span class="problematic" id="id10">``</span></a><a href="#id11"><span class="problematic" id="id12">`</span></a>python</p>
80-
<p><a href="#id13"><span class="problematic" id="id14">``</span></a><a href="#id15"><span class="problematic" id="id16">`</span></a></p>
81-
<p>You can also use it to benchmark against other machine learning models with the <cite>benchmark</cite> method of MatPipe, which runs a Nested Cross Validation. The Nested CV scheme
82-
is typically a more robust way of estimating an ML pipeline’s generalizaiton error than a simple train/validation/test split.
83-
<a href="#id17"><span class="problematic" id="id18">``</span></a><a href="#id19"><span class="problematic" id="id20">`</span></a>python
84-
from automatminer.pipeline import MatPipe
85-
from sklearn.model_selection import KFold</p>
86-
<p>pipe = MatPipe()
87-
predictions_per_fold = pipe.benchmark(df, “bulk modulus”, KFold(n_splits=5))
88-
<a href="#id21"><span class="problematic" id="id22">``</span></a><a href="#id23"><span class="problematic" id="id24">`</span></a></p>
89-
<p>Once a MatPipe has been fit, you can examine it internally to see how it works using <cite>pipe.digest()</cite>; or pickle it for later with <cite>pipe.save()</cite>.</p>
90-
<p>### Citing automatminer
91-
We are in the process of writing a paper for automatminer. In the meantime, please use the citation given in the matminer repo.</p>
92-
<p>## Contributing
93-
Interested in contributing? See our [contribution guidelines](<a class="reference external" href="https://github.com/hackingmaterials/automatminer/blob/master/CONTRIBUTING.md">https://github.com/hackingmaterials/automatminer/blob/master/CONTRIBUTING.md</a>) and make a pull request! Please submit questions, issues / bug reports, and all other communication through the [matminer Google Group](<a class="reference external" href="https://groups.google.com/forum/#!forum/matminer">https://groups.google.com/forum/#!forum/matminer</a>).</p>
60+
</div>
61+
<div class="section" id="automatminer-is-widely-applicable">
62+
<h1>automatminer is widely applicable<a class="headerlink" href="#automatminer-is-widely-applicable" title="Permalink to this headline"></a></h1>
63+
<div class="section" id="automatminer-can-work-with-many-kinds-of-data">
64+
<h2>Automatminer can work with many kinds of data:<a class="headerlink" href="#automatminer-can-work-with-many-kinds-of-data" title="Permalink to this headline"></a></h2>
65+
<ul class="simple">
66+
<li>both computational and experimental data</li>
67+
<li>small (~100 samples) to moderate (~100k samples) sized datasets</li>
68+
<li>crystalline datasets</li>
69+
<li>composition-only (i.e., unknown phases) datasets</li>
70+
<li>datasets containing electronic bandstructures or density of states</li>
71+
</ul>
72+
</div>
73+
<div class="section" id="many-kinds-of-target-properties">
74+
<h2>Many kinds of target properties:<a class="headerlink" href="#many-kinds-of-target-properties" title="Permalink to this headline"></a></h2>
75+
<ul class="simple">
76+
<li>electronic</li>
77+
<li>mechanical</li>
78+
<li>thermodynamic</li>
79+
<li>any other kind of property</li>
80+
</ul>
81+
</div>
82+
<div class="section" id="and-many-featurization-descriptor-techniques">
83+
<h2>And many featurization (descriptor) techniques:<a class="headerlink" href="#and-many-featurization-descriptor-techniques" title="Permalink to this headline"></a></h2>
84+
<p>See <a class="reference external" href="https://hackingmaterials.github.io/matminer/featurizer_summary.html">matminer’s Table of Featurizers</a> for a full (and growing) list.</p>
85+
</div>
86+
</div>
87+
<div class="section" id="full-code-examples">
88+
<h1>Full Code Examples<a class="headerlink" href="#full-code-examples" title="Permalink to this headline"></a></h1>
89+
</div>
90+
<div class="section" id="citing-automatminer">
91+
<h1>Citing automatminer<a class="headerlink" href="#citing-automatminer" title="Permalink to this headline"></a></h1>
92+
<p>We are in the process of writing a paper for automatminer. In the meantime, please use the citation given in the <a class="reference external" href="https://github.com/hackingmaterials/matminer">matminer repo</a>.</p>
93+
</div>
94+
<div class="section" id="contributing">
95+
<h1>Contributing<a class="headerlink" href="#contributing" title="Permalink to this headline"></a></h1>
96+
<p>Interested in contributing? See our <a class="reference external" href="https://github.com/hackingmaterials/automatminer/blob/master/CONTRIBUTING.md">contribution guidelines</a> and make a pull request! Please submit questions, issues / bug reports, and all other communication through the <a class="reference external" href="https://groups.google.com/forum/#!forum/matminer">matminer Google Group</a>.</p>
9497
</div>
9598
<div class="section" id="indices-and-tables">
96-
<h2>Indices and tables<a class="headerlink" href="#indices-and-tables" title="Permalink to this headline"></a></h2>
99+
<h1>Indices and tables<a class="headerlink" href="#indices-and-tables" title="Permalink to this headline"></a></h1>
97100
<ul class="simple">
98101
<li><a class="reference internal" href="genindex.html"><span class="std std-ref">Index</span></a></li>
99102
<li><a class="reference internal" href="py-modindex.html"><span class="std std-ref">Module Index</span></a></li>
100103
<li><a class="reference internal" href="search.html"><span class="std std-ref">Search Page</span></a></li>
101104
</ul>
102-
</div>
103105
</div>
104106

105107

docs/objects.inv

-25 Bytes
Binary file not shown.

0 commit comments

Comments
 (0)