|
6 | 6 | <head>
|
7 | 7 | <meta http-equiv="X-UA-Compatible" content="IE=Edge" />
|
8 | 8 | <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
|
9 |
| - <title>automatminer is applicable to many problems — automatminer 2019.01.26_beta documentation</title> |
| 9 | + <title>How it works — automatminer 2019.01.26_beta documentation</title> |
10 | 10 | <link rel="stylesheet" href="_static/alabaster.css" type="text/css" />
|
11 | 11 | <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
|
12 | 12 | <script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
|
|
32 | 32 | <div class="body" role="main">
|
33 | 33 |
|
34 | 34 | <style> .red {color:#aa0060; font-weight:bold; font-size:16px} </style><p><span class="red">WARNING! These docs are incomplete. Read and use at your own risk!`</span></p>
|
35 |
| -<a class="reference internal image-reference" href="_images/logo_med.png"><img alt="server" class="align-center" src="_images/logo_med.png" style="width: 400px;" /></a> |
36 |
| -<p>Automatminer is a tool for automatically creating complete machine learning pipelines for materials science, which includes automatic featurization with <a class="reference external" href="https://github.com/hackingmaterials/matminer">matminer</a>, feature reduction, and an AutoML backend. Put in a materials dataset, get out a machine that predicts materials properties.</p> |
| 35 | +<a class="reference internal image-reference" href="_images/logo_med.png"><img alt="server" class="align-center" src="_images/logo_med.png" style="width: 600px;" /></a> |
| 36 | +<p>Automatminer is a tool for <em>automatically</em> creating <strong>complete</strong> machine learning pipelines for materials science, including automatic featurization with <a class="reference external" href="https://github.com/hackingmaterials/matminer">matminer</a>, feature reduction, and an AutoML backend. Put in a materials dataset, get out a machine that predicts materials properties.</p> |
| 37 | +<div class="section" id="how-it-works"> |
| 38 | +<h1>How it works<a class="headerlink" href="#how-it-works" title="Permalink to this headline">¶</a></h1> |
| 39 | +<p>Automatminer automatically decorates a dataset using hundreds of descriptor techniques from matminer’s descriptor library, picks the most useful features for learning, and runs a separate AutoML pipeline using TPOT. Once a pipeline has been fit, it can be examined with skater’s interpretability tools, summarized in a text file, saved to disk, or used to make new predictions.</p> |
37 | 40 | <img alt="server" class="align-center" src="_images/automatminer_big.jpg" />
|
38 | 41 | <p>Here’s an example of training on known data, and extending the model to out of sample data.</p>
|
39 | 42 | <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">automatminer.pipeline</span> <span class="kn">import</span> <span class="n">MatPipe</span>
|
|
46 | 49 | <span class="n">predicted_df</span> <span class="o">=</span> <span class="n">pipe</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">unknown_df</span><span class="p">,</span> <span class="s2">"band gap"</span><span class="p">)</span>
|
47 | 50 | </pre></div>
|
48 | 51 | </div>
|
49 |
| -<p>Alternatively, run a nested cross validation benchmark on a known dataset, and then compare the results against your own ML models:</p> |
| 52 | +<p>Or, run a (relatively) rigorous nested cross validation benchmark on a known dataset, and then compare the results against your own ML models:</p> |
50 | 53 | <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">automatminer.pipeline</span> <span class="kn">import</span> <span class="n">MatPipe</span>
|
51 | 54 | <span class="kn">from</span> <span class="nn">sklearn.model_selection</span> <span class="kn">import</span> <span class="n">KFold</span>
|
52 | 55 |
|
53 | 56 | <span class="n">pipe</span> <span class="o">=</span> <span class="n">MatPipe</span><span class="p">()</span>
|
54 | 57 | <span class="n">predictions_per_fold</span> <span class="o">=</span> <span class="n">pipe</span><span class="o">.</span><span class="n">benchmark</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="s2">"bulk modulus"</span><span class="p">,</span> <span class="n">KFold</span><span class="p">(</span><span class="n">n_splits</span><span class="o">=</span><span class="mi">5</span><span class="p">))</span>
|
55 | 58 | </pre></div>
|
56 | 59 | </div>
|
57 |
| -<div class="section" id="automatminer-is-applicable-to-many-problems"> |
58 |
| -<h1>automatminer is applicable to many problems<a class="headerlink" href="#automatminer-is-applicable-to-many-problems" title="Permalink to this headline">¶</a></h1> |
59 |
| -<p>Automatminer can work with many kinds of data: |
60 |
| -* both computational and experimental data |
61 |
| -* small (~100 samples) to moderate (~100k samples) sized datasets |
62 |
| -* crystalline datasets |
63 |
| -* composition-only (i.e., unknown phases) datasets |
64 |
| -* datasets containing electronic bandstructures or density of states</p> |
65 |
| -<p>…Many kinds of target properties |
66 |
| -* electronic |
67 |
| -mechanical |
68 |
| -thermodynamic |
69 |
| -any other kind of property</p> |
70 |
| -<p>…And many featurization (descriptor) techniques: |
71 |
| -<em>list them</em></p> |
72 |
| -<p>Automatminer automatically decorates a dataset using hundreds of descriptor techniques from matminer’s descriptor library, picks the most useful features for learning, and runs a separate AutoML pipeline using TPOT. Once a pipeline has been fit, it can be examined with skater’s interpretability tools, summarized in a text file, saved to disk, or used to make new predictions.</p> |
73 |
| -<div class="section" id="code-examples"> |
74 |
| -<h2>Code Examples<a class="headerlink" href="#code-examples" title="Permalink to this headline">¶</a></h2> |
75 |
| -<p>The easiest (and most automatic) way to use automatminer is through the MatPipe object. First, fit the MatPipe to a dataframe containing materials objects such as chemical compositions (or pymatgen Structures) and some material target property. |
76 |
| -<a href="#id1"><span class="problematic" id="id2">``</span></a><a href="#id3"><span class="problematic" id="id4">`</span></a>python</p> |
77 |
| -<p><a href="#id5"><span class="problematic" id="id6">``</span></a><a href="#id7"><span class="problematic" id="id8">`</span></a></p> |
78 |
| -<p>Now use your pipeline to predict the properties of some other data, such as a new composition or structure. |
79 |
| -<a href="#id9"><span class="problematic" id="id10">``</span></a><a href="#id11"><span class="problematic" id="id12">`</span></a>python</p> |
80 |
| -<p><a href="#id13"><span class="problematic" id="id14">``</span></a><a href="#id15"><span class="problematic" id="id16">`</span></a></p> |
81 |
| -<p>You can also use it to benchmark against other machine learning models with the <cite>benchmark</cite> method of MatPipe, which runs a Nested Cross Validation. The Nested CV scheme |
82 |
| -is typically a more robust way of estimating an ML pipeline’s generalizaiton error than a simple train/validation/test split. |
83 |
| -<a href="#id17"><span class="problematic" id="id18">``</span></a><a href="#id19"><span class="problematic" id="id20">`</span></a>python |
84 |
| -from automatminer.pipeline import MatPipe |
85 |
| -from sklearn.model_selection import KFold</p> |
86 |
| -<p>pipe = MatPipe() |
87 |
| -predictions_per_fold = pipe.benchmark(df, “bulk modulus”, KFold(n_splits=5)) |
88 |
| -<a href="#id21"><span class="problematic" id="id22">``</span></a><a href="#id23"><span class="problematic" id="id24">`</span></a></p> |
89 |
| -<p>Once a MatPipe has been fit, you can examine it internally to see how it works using <cite>pipe.digest()</cite>; or pickle it for later with <cite>pipe.save()</cite>.</p> |
90 |
| -<p>### Citing automatminer |
91 |
| -We are in the process of writing a paper for automatminer. In the meantime, please use the citation given in the matminer repo.</p> |
92 |
| -<p>## Contributing |
93 |
| -Interested in contributing? See our [contribution guidelines](<a class="reference external" href="https://github.com/hackingmaterials/automatminer/blob/master/CONTRIBUTING.md">https://github.com/hackingmaterials/automatminer/blob/master/CONTRIBUTING.md</a>) and make a pull request! Please submit questions, issues / bug reports, and all other communication through the [matminer Google Group](<a class="reference external" href="https://groups.google.com/forum/#!forum/matminer">https://groups.google.com/forum/#!forum/matminer</a>).</p> |
| 60 | +</div> |
| 61 | +<div class="section" id="automatminer-is-widely-applicable"> |
| 62 | +<h1>automatminer is widely applicable<a class="headerlink" href="#automatminer-is-widely-applicable" title="Permalink to this headline">¶</a></h1> |
| 63 | +<div class="section" id="automatminer-can-work-with-many-kinds-of-data"> |
| 64 | +<h2>Automatminer can work with many kinds of data:<a class="headerlink" href="#automatminer-can-work-with-many-kinds-of-data" title="Permalink to this headline">¶</a></h2> |
| 65 | +<ul class="simple"> |
| 66 | +<li>both computational and experimental data</li> |
| 67 | +<li>small (~100 samples) to moderate (~100k samples) sized datasets</li> |
| 68 | +<li>crystalline datasets</li> |
| 69 | +<li>composition-only (i.e., unknown phases) datasets</li> |
| 70 | +<li>datasets containing electronic bandstructures or density of states</li> |
| 71 | +</ul> |
| 72 | +</div> |
| 73 | +<div class="section" id="many-kinds-of-target-properties"> |
| 74 | +<h2>Many kinds of target properties:<a class="headerlink" href="#many-kinds-of-target-properties" title="Permalink to this headline">¶</a></h2> |
| 75 | +<ul class="simple"> |
| 76 | +<li>electronic</li> |
| 77 | +<li>mechanical</li> |
| 78 | +<li>thermodynamic</li> |
| 79 | +<li>any other kind of property</li> |
| 80 | +</ul> |
| 81 | +</div> |
| 82 | +<div class="section" id="and-many-featurization-descriptor-techniques"> |
| 83 | +<h2>And many featurization (descriptor) techniques:<a class="headerlink" href="#and-many-featurization-descriptor-techniques" title="Permalink to this headline">¶</a></h2> |
| 84 | +<p>See <a class="reference external" href="https://hackingmaterials.github.io/matminer/featurizer_summary.html">matminer’s Table of Featurizers</a> for a full (and growing) list.</p> |
| 85 | +</div> |
| 86 | +</div> |
| 87 | +<div class="section" id="full-code-examples"> |
| 88 | +<h1>Full Code Examples<a class="headerlink" href="#full-code-examples" title="Permalink to this headline">¶</a></h1> |
| 89 | +</div> |
| 90 | +<div class="section" id="citing-automatminer"> |
| 91 | +<h1>Citing automatminer<a class="headerlink" href="#citing-automatminer" title="Permalink to this headline">¶</a></h1> |
| 92 | +<p>We are in the process of writing a paper for automatminer. In the meantime, please use the citation given in the <a class="reference external" href="https://github.com/hackingmaterials/matminer">matminer repo</a>.</p> |
| 93 | +</div> |
| 94 | +<div class="section" id="contributing"> |
| 95 | +<h1>Contributing<a class="headerlink" href="#contributing" title="Permalink to this headline">¶</a></h1> |
| 96 | +<p>Interested in contributing? See our <a class="reference external" href="https://github.com/hackingmaterials/automatminer/blob/master/CONTRIBUTING.md">contribution guidelines</a> and make a pull request! Please submit questions, issues / bug reports, and all other communication through the <a class="reference external" href="https://groups.google.com/forum/#!forum/matminer">matminer Google Group</a>.</p> |
94 | 97 | </div>
|
95 | 98 | <div class="section" id="indices-and-tables">
|
96 |
| -<h2>Indices and tables<a class="headerlink" href="#indices-and-tables" title="Permalink to this headline">¶</a></h2> |
| 99 | +<h1>Indices and tables<a class="headerlink" href="#indices-and-tables" title="Permalink to this headline">¶</a></h1> |
97 | 100 | <ul class="simple">
|
98 | 101 | <li><a class="reference internal" href="genindex.html"><span class="std std-ref">Index</span></a></li>
|
99 | 102 | <li><a class="reference internal" href="py-modindex.html"><span class="std std-ref">Module Index</span></a></li>
|
100 | 103 | <li><a class="reference internal" href="search.html"><span class="std std-ref">Search Page</span></a></li>
|
101 | 104 | </ul>
|
102 |
| -</div> |
103 | 105 | </div>
|
104 | 106 |
|
105 | 107 |
|
|
0 commit comments